Keywords

1 Introduction

Traditional NLP models can achieve accurate prediction results using statistical correlations within data. However, performance of the conventional methods mainly depends on the data distribution of the training and testing datasets. For this reason, analyzing causal relationships which utilize the data generating process are helpful to create robust models [8, 31]. More specifically, causal inference is a way of generating counterfactual explanations in hypothetical scenarios such as how the outcome variable is affected by an intervention on a treatment variable. The causal inference has been applied to create inferences on imaginary situations in several fields, but its practical applications in NLP have started to gain attention.

The cause-effect relationships of linguistic properties can be examined using causal inference by measuring the change in the outcome resulting from an intervention on a treatment. Under an imaginary scenario, the potential outcomes can be estimated by satisfying ignorability, positivity, and consistency assumptions (details given in Sect. 3.1). Usually, NLP applications rely on observational data, so randomly assigning texts is not feasible. In other words, to satisfy the ignorability assumption in observational studies while assigning treatment, there should not exist any unobserved confounders (predict both treatment and outcome). Identification is also another key aspect of causal inference for NLP, which suggests that the linguistic properties can be expressed using proxy labels [22, 32, 44]. Additionally, it is assumed that proxy labels can estimate the ground-truth causal relation of linguistic properties.

Many state-of-the-art NLP models can be considered black-box models, which receive text documents as input and generate an output dependent on the task. Therefore, explaining and intervening in the predictions of such models remain a challenging problem [5, 8, 35]. Some studies examined the applicability of causal methods to interpret the black-box NLP models by generating counterfactual statements [26]. These works can be classified as data perspective [38] and model component perspective [14, 27] where the former is related to counterfactual statement generation and exploiting network artifacts is an example of the latter.

In this study, we focus on irony and sarcasm detection problem, and explore text-based causal inference by using the TextCause algorithm [32] to measure the causal effect of linguistic properties on this problem. Irony and sarcasm detection refers to way of verbal expressions such that the one’s meaning is expressed through signifying just the opposite. Therefore, the problem includes difficulties and analyzing the causal relationships can provide insight for explainability of the generated models and improving the detection performance. The main contributions of this study can be summarized as follows:

  • The causal effect of linguistic properties are examined in irony and sarcasm detection tasks using the TextCause algorithm [32].

  • Latent confounders within text documents are modeled by using K-Means clustering and LDA topic modeling and their effects on the causal inference are analyzed.

  • The obtained results provide insight in terms of the causal interpretability and explainability aspects.

2 Related Works

This study is built on top of the TextCause algorithm proposed by Pryzant et al. [32]. The authors use DistilBERT [39] language model to adjust text and they are inspired by Veitch et al.’s CausalBERT study [43] which adapts BERT to adjust texts as a confounder. Additionally, they generate causal embeddings using causal topic models, which were adopted from Blei et al. [1]. Keith et al. [17] summarize the methods to adjust texts for causal inference. Moreover, Fong et al. [11] discuss the required assumptions to use latent features of text as treatment. In another study, they also use topic modeling to discover latent treatments in texts [10]. Moreover, Wood-Doughty et al. [46] address the challenges of using proxy treatments for causal inference.

Recently, Yang et al. [47] conduct a survey of existing causality extraction methods for texts. Moreover, Feder et al. [8] provide a review of the use-cases of text-based causal inference and discuss fairness, interpretability, and robustness aspects. Texts can be considered as treatment [32, 48], confounder [17, 43], outcome [7] and even mediator [18] settings. Sridhar et al. [40] examine the causal effect of tone on online debates. Koroleva et al. [21] propose a model to measure the similarity of pairs of clinical trial outcomes and reports semantically using BERT-based language models.

There exist comprehensive studies that review models to explain black-box NLP models [5, 8, 26]. More recently, Chou et al. [4] also examine an in-depth review of the studies on model-agnostic counterfactual algorithms and argue that many such studies do not rely on causal theoretical formalism. Wang et al. [45] utilize a causal approach to exploit the attention weights of a sentiment classifier. Besides, perturbation-based approaches [23, 35] have been used for explanation. Another prominent and challenging text-based causal explanation method is counterfactual statement generation [12, 36, 38] which requires manipulating text in a meaningful manner. Therefore, instead of modifying the text itself, changing its representation has emerged by [9, 33]. Besides, Buyukbas et al. [2] work on the same Turkish tweet dataset as in this study and examine the explainability of transformer architectures using two popular explainability tools, LIME [35] and SHAP [23] for irony detection task. Likewise, Hazarika et al. [15] propose the CASCADE model that utilizes both contextual and content information to improve the sarcasm detection performance significantly on SARC [19] dataset.

3 Background

3.1 Causal Inference

Typical NLP models use statistical associations to make decisions and estimate the dataset’s distribution using the training data. On the other hand, causal inference is an inverse problem that figures out the structural causal model of the data generating process, which leads to more robust and invariant models. Causal inference is about answering the counterfactual queries based on the intervention of interest. However, the counterfactual outcomes do not exist in the observational data in most cases. Therefore, the causal effect is the change of outcome variable Y by the intervention on treatment X when all other covariates are kept constant.

The initial step of causal inference represents the association between variables as Structural Causal Models (SCMs). The SCMs consist of directed acyclic graphs (DAGs) and a mathematical problem formulation. The variables are represented as nodes, and edges represent the causal relationship between variables.

Definition 1 (Structural Causal Model)

It consists of 3-tuples (U, V, E) where U denotes a set of exogenous variables (independent from the states), V denotes a set of endogenous variables (dependent to other states in the system) and they are connected by a set of structural equations, E, where each equation defines endogenous variables in terms of U and V.

After representing the causal model as a graph, interventions on a treatment can be expressed using Pearl’s do-calculus notation [30]. Three rules of do-calculus which allow to simulate interventions on treatment to identify causal relationships in DAGs are summarized below:

  • Rule 1: Insertion and deletion of observations

    $$\begin{aligned} \text {P(Y }|\text {do(X), Z, W)} = \text {P(Y }|\text {do(X), Z), if W is irrelevant to Y} \end{aligned}$$
  • Rule 2: Action/observation exchange

    $$\begin{aligned} \text {P(Y }|\text {do(X), Z)} = \text {P(Y }|\text {X, Z), if Z blocks all back-door paths from X to Y} \end{aligned}$$
  • Rule 3: Insertion and deletion of actions

    $$\begin{aligned} \text {P(Y }| \text {do(X))} = \text {P(Y), if there is no causal path from X to Y} \end{aligned}$$

The first rule suggests that we can omit variables W if it is irrelevant to outcome Y. However, the second rule states that if variables Z blocks all backdoor paths from treatment X to Y, we must condition on Z. Finally, the third rule asserts that if there is no causal path from X to Y, we should not condition on X. A causal inference framework can estimate the counterfactual outcomes by making some assumptions that need to satisfy three criteria listed below:

  • Ignorability: The treatment assignment and the counterfactual outcomes must be independent by randomizing the treatment assignment. However, for observational data, it is not feasible. Therefore, softer conditional ignorability criteria should be satisfied, which requires no unobserved confounders in the dataset.

  • Positivity: For all covariates, the probability of receiving treatment must be greater than 0.

  • Consistency: The outcome at unit i is only affected by the treatment at the same unit.

3.2 NLP with Causality

Texts are inherently high dimensional, and by encoding texts using language models, hidden factors such as topic, tone, and writing style can be discovered. BERT [6], a bi-directional transformer-based language model, had a breakthrough on NLP, which outperformed previous models on many tasks with significant margins. However, Feder et al. [8] indicated that such models utilize statistical relationships while making decisions. Therefore, their predictions can be considered unreliable. Moreover, McCoy et al. [25] pointed out that these language models may fail when the data distribution of the test set changes significantly since these models rely on some statistical reasonings. As a result, causal models are required to increase the models’ generalization performance.

Secondly, the reasoning of any model can be evaluated with sensitivity and invariance tests. The former identifies how much minimal perturbation is necessary to switch the model’s decision for the given sample. On the other hand, the latter determines whether a change in a causally unrelated feature impacts the model’s decision. These tests can be valuable to interpret the model’s robustness by feeding counterfactual inputs. Besides, Veitch et al. [42] stated that invariant models can perform better on different data distributions.

3.3 Causal Model Explainability

Language models such as BERT [6] are not inherently explainable. According to Moraffah et al. [26] exploiting network artifacts such as attention weights is one approach to infer the decisions of a neural model. However, these approaches can only describe token-wise information. In addition, perturbating instances near decision boundary [23, 35] is another way of interpretability. Yet, sentence-level estimates of such models may not be so successful [8]. In other words, these approaches may result in erroneous explanations to the decision-makers since they compute correlations between features [4, 20, 37].

In this context, causal models can generate counterfactual instances which can be used for interpretability [8]. For instance, a data sample’s prediction can be compared with its counterfactual representative. More specifically, if a text contains a concept, its counterfactual will not include that concept, and their outputs can be compared to learn how the model makes decisions.

4 Methods

In this work, we investigate the causal inference for irony and sarcasm detection problem, which involves text analysis. Therefore we apply text based causal inference algorithm, TextCause, [32]. In addition to adapting TextCause to irony/sarcasm detection problem, we extend the use of confounders by using unsupervised data analysis.

4.1 Text-Based Causal Inference Using TextCause

TextCause, which is proposed by Pryzant et al. [32], employs the CausalBERT model [43] that adjusts text for causal inference. The key innovation of the TextCause algorithm is the assumption that neither the writer’s intent nor the reader’s perception can be identified from observational data. Therefore, the authors express the need to employ a proxy label \(\hat{T}\) to estimate the causal effect of a linguistic property. In other words, they train a proxy classifier to capture both the writer’s intent and the reader’s perception. The proposed structural causal model is presented in Fig. 1. According to this structural causal model, a writer writes a text W that contains a linguistic property T with other covariates Z. A reader’s perception of that linguistic property is represented by \(\widetilde{T}\) and \(\widetilde{Z}\) and affects the outcome Y which can be estimated using a proxy label \(\hat{T}\). Besides, the authors state that the bias due to proxy treatment decreases as the proxy classifier’s accuracy increases. Therefore, for observational data, actual linguistic property T can be measured using proxy labels \(\hat{T}\).

Fig. 1.
figure 1

The causal model in [32]

The conditional ignorability assumption of causal inference requires that the treatment assignment should be independent of the outcomes for observational data. In other words, this assumption states that we need to adjust for all confounders to estimate the causal effect of the treatment. The causal effect can be estimated using the Average Treatment Effect (ATE), which is formulated in Eq. 1.

$$\begin{aligned} ATE = E[Y; do(T=1)] - E[Y; do(T=0)] \end{aligned}$$
(1)

ATE can be expressed as the difference between the interventional outcome (T = 1) and the counterfactual outcome (T = 0). However, text documents may contain some hidden confounders, such as tone and writing style, so we need to adjust the ATE for all confounders using Pearl’s backdoor-adjustment [29]. Since the authors use proxy labels to estimate the ATE, the modified ATE estimation is given in Eq. 2. The TextCause model uses DistilBERT to generate a representation of texts and employs the special classification token, CLS, to approximate the confounding information \(\hat{Z}\). Therefore, the ATE estimator relies on the treatment, the language model representation of text and the one-hot encoding of the covariates. As a result, the model learns two vectors that corresponds to the language model representation and one-hot encoded covariates respectively.

$$\begin{aligned} ATE_{proxy} = E_{W}[E[Y | \hat{T}=1, \widetilde{Z}=f(W)] - E[Y | \hat{T}=0, \widetilde{Z}=f(W)]] \end{aligned}$$
(2)

In addition to the text adjustment, another contribution of the TextCause algorithm is improving the recall of the proxy labels, which is motivated by lexicon induction [13] and label propagation [49]. The authors train logistic regression and pu-classifier models to predict proxy labels \(\hat{T^{*}}\) and relabel the instances that labeled as \(\hat{T}\) = 0 but predicted as \(\hat{T^{*}}\) = 1. As a result, improved proxy labels and texts are required to measure the causal effect. Additional covariates and language model representation of a text should be adjusted as a confounder. Hence, the TextCause algorithm utilizes both proxy label improvement and text adjustment to estimate the causal effect of desired linguistic property.

4.2 Unsupervised Data Analysis for Determining Confounders

While applying text based causal inference on irony/sarcasm detection problem, the categories or groupings within the text collection is considered as a confounder. In order to determine the subgroups, two different techniquesFootnote 1, topic modeling and clustering, are used.

Topic Modeling. Topic modeling is a statistical method to discover latent topics in a corpus. It is an unsupervised technique that examines semantic structures in a text. Moreover, the topics represent a group of similar words that are determined by statistical models. A document can be a mixture of several topics with different proportions based on a word’s appearance in one of the topics. Therefore, a document can be classified using topic modeling based on the words’ relevance to the abstract topics.

Latent Dirichlet Allocation (LDA) [1] is one of the most popular topic modeling techniques. It is a generative statistical model that uses the Dirichlet priors for word-topic and document-topic distributions and represents documents as a mixture of topics where the distribution over words determines the proportions. Given a corpus with M documents where a document \(w_{i}\) contains N-words and \(\alpha \) and \(\beta \) are the Dirichlet prior parameters, the probability distribution of a document can be expressed as in Eq. 3. In this study, we lemmatized texts using SpaCyFootnote 2 and performed LDA to discover abstract topics that highlight several aspects of the document collection.

$$\begin{aligned} P(D, \alpha , \beta ) = \prod \nolimits _{m=1}^M \int P(\theta _{m}|\alpha )(\prod \nolimits _{n=1}^N \sum \nolimits _{Z_{mn}} P(Z_{mn} | \theta _{m}) P(W_{mn} | Z_{mn}, \beta ))m\theta _{m} \end{aligned}$$
(3)

Clustering. Texts are inherently high-dimensional, so a text should be encoded to a latent vector space. Sentence embeddings map sentences to vectors that can measure semantic similarity between sentences or text summarization. Transformers [41] made a remarkable impact on NLP tasks that passed previous models with a substantial margin. Reimers et al. [34] introduce S-BERT, which is a transformer-based sentence embedding model. S-BERT was built on top of the pre-trained BERT [6] model but uses siamese and triplet networks to extract semantically meaningful sentence embeddings. The S-BERT produces large-sized vectors as sentence embedding, which should be transformed into a lower-dimensional space for clustering. Dimensionality reduction techniques such as PCA [16], and t-SNE [24] can be applied to transform high-dimensional data into a lower-dimensional space by preserving the meaningful information in the data.

Clustering is an unsupervised machine learning technique that groups similar data instances together. K-Means clustering is one of the most popular clustering methods that assign n data points to k clusters where each data point is assigned to a cluster whose cluster center is the nearest. Since unsupervised models do not have a ground truth, metrics such as the silhouette coefficient can measure the clustering quality. This study uses S-BERT to encode texts in a fixed-size latent space and applies dimensionality reduction using PCA or t-SNE. Finally, the transformed data is given to a K-Means model to group semantically similar texts.

4.3 Modeling Causal Inference for Irony and Sarcasm Detection

In this work, we explore the cause-effect relationship for irony and sarcasm detection on two scenarios. The treatments (T), outputs (Y) and confounders (Z) considered in the scenarios are as follows.

Case 1. We measure the effect of writing sarcastic posts (T) on the popularity of the post, number of likes, (Y) and consider subreddit category, cluster label (by the K-Means model) and the topic category (by the LDA model) as confounder (Z), separately.

Case 2. We examine whether putting an exclamation mark (!) affects irony detection. In other words, we explore whether the exclamation mark (T) affects the readers’ perception of a text as ironic (Y). The cluster label and topic category were also considered confounder (Z) in this scenario.

5 Experiments

5.1 Dataset and Settings

The first dataset that we use in our study is a Self-Annotated Reddit Corpus (SARC) [19] that contains 1.3 million sarcastic Reddit posts. It is a publicly-available dataset, and statements that end with “/s” marker, a common sarcastic marker of Reddit users, are annotated as sarcastic. Therefore, we can consider that the dataset might contain some false negative statements, such that there may be some statements that should be annotated as sarcastic but not marked as such. Moreover, we should not assume that all Reddit users know such markers, so the dataset might also contain some false positive statements. Secondly, we use a Turkish tweet dataset for irony detection [3, 28]. The dataset contains 300 non-ironic and 300 ironic tweets in Turkish, which were annotated manually.

The experiments are performed on Nvidia GeForce RTX 2080 Super GPU with 8 GB memory. The computer also includes Intel i7-8700k CPU@3.7 GHz with 12 cores. While implementing the model, Huggingface’s multilingual DistilBERT [39] is used. It is a lighter BERT model that performs very close to the original model using significantly fewer parameters. Additionally, we performed some validation experiments to adjust hyperparameters such as epoch and learning rate. In Sect. 5.2, we present only the results with the best hyperparameter settings.

5.2 Results

Case 1 Results. In this experiment, we assume that the subreddit category, topic label and cluster label affect the treatment and outcome, so we consider these attributes as confounder.

Firstly, we gather the posts in “AskReddit” (Z = 0), “news” (Z = 1), “worldnews” (Z = 2), and “politics” (Z = 3) subreddits. If the posts’ score is above five, we annotate them as “liked” comments. Besides if the posts’ score is below 0, we annotate them as “disliked” comments. Overall, the number of comments satisfying these conditions are 37 K approximately. ,The number of popular (liked) posts within each confounder is given in Fig. 2.

Fig. 2.
figure 2

Number of reddit posts for each confounder settings

Secondly, we assume that the LDA topic models could be used as a confounder. We measure the coherence score for various topic counts and observe that setting of 10 topics is a reasonable choice among a set of alternatives. The coherence score of this setting is 0.312. Likewise, we apply K-Means clustering to find optimal number of clusters with the collection of posts. According to Fig. 3, K = 3 is sensible among the selected set of values according to elbow analysis. Additionally, for K = 3, PCA and t-SNE plots are given in Fig. 4.

Fig. 3.
figure 3

WSS and silhouette plots

Fig. 4.
figure 4

K-Means clusters of reddit comments

Finally, we measure the ATE score using the subreddit category, topic label, and cluster label as a confounder. Since the TextCause model requires proxy labels, we trained a BERT model using 400 K Reddit documents (80%–20% train-val sets) from other categories. The accuracy of the proxy classifier on the selected subreddits is 78.6%, and the f1-score is also calculated as 0.806. The TextCause model measures the oracle ATE value using the ground truth sarcastic label. The unadjusted ATE measures the treatment effect without adjusting for any covariates. The T-boost values consider improved proxy treatments using pu classifier (to improve the recall for positive instances) and logistic regression. W adjust is another estimator that adjusts for text. Moreover, the last two estimates combine W adjust with T-boost.

We trained the TextCause algorithm for five epochs. According to the ATE scores in Table 1, adjusting for the topic label, cluster label, and subreddit category improves the ATE result. The oracle value suggests that the sarcastic writing style increases the chance of a post being liked between 6% and 10%. Additionally, the closest estimations are predicted by the T-boost reg model, and the TextCause models’ subreddit and cluster label estimations are very close to the oracle estimator. However, when we adjust for topic labels, the unadjusted ATE estimator, which calculates ATE without adjusting for any covariate, becomes the second closest estimator overall.

Table 1. Case 1: Subreddit, topic and cluster labels were considered as confounder

Case 2 Results. In this experiment, we measure the effect of using an exclamation mark (!) on the irony. Since the treatment is evident, there is no need for a proxy label. We evaluate the causal question on the Turkish irony dataset, which is annotated by [3, 28]. As in the first experiment, we consider the topic and cluster labels as a confounder. Figure 5 indicates the number of tweets for each confounder settings. According to the WSS and silhouette plots given in Fig. 3, the highest silhouette score is measured when K = 2. The clusters projected with the PCA and t-SNE are presented in Fig. 6. On the other hand, for LDA model, 10 topics settings is a reasonable choice since the coherence score of this setting is measured as 0.7318.

Fig. 5.
figure 5

Number of tweets for each confounder settings

We trained the TextCause algorithm for 15 epochs. According to the ATE results that are presented in Table 2, the treatment has a considerable impact on the posts’ irony. However, contrary to our expectations, there is an inverse relationship between the treatment and the outcome. As seen in Fig. 5, this is possibly due to that the number of ironic tweets that contain an exclamation mark is just 17% (51 out of 300 tweets) of the all ironic tweets. In addition, text adjustment for LDA topic labels estimates the closest prediction to the oracle value. However, for cluster labels the unadjusted setting was the closest among the all estimators. Note that, we do not present the results of the T-boost estimators because proxy labels were not appropriate in this setting.

Fig. 6.
figure 6

K-Means clusters of tweets

Table 2. Case 2: Topic and cluster labels were considered as confounder

6 Conclusion

This study addresses the application of causal inference to text analysis. Specifically, we employ the TextCause algorithm [32] to estimate the causal effect of sarcastic linguistic properties on a text’s popularity, and use of punctuations, particularly (!) on understanding/detecting irony. Moreover, we perform unsupervised data analysis using clustering and topic modeling and utilize these methods’ output for the causal inference. According to the measurements, cluster and topic labels may contain latent information on ironic linguistic properties and the popularity of the posts. The results can be reexamined in-depth in terms of explainability for future work. For instance, counterfactual statements that do not contain a specific linguistic property can be generated and fed into the causal-text model. The results can be examined in terms of invariance and sensitivity.