Depression and anxiety are associated with patterns of negative thinking that can be targeted through cognitive restructuring as a part of cognitive therapy (CT) or cognitive behavioral therapy (CBT). Our team has created a set of cognitive distortion schemata (CDS) n-grams based on theories underlying CT to measure the linguistic markers that indicate cognitive vulnerability to depression. These CDS were specifically designed to examine online language. Our prior work supports a relationship between CDS and a diagnosis of depression, but less is known about the relationship between online language, CDS, and anxiety. The current study measures if CDS can be detected in people who report anxiety symptoms, and whether CDS increase with symptom severity.
Methods
1,377 participants were recruited from a study assessing social media use and mental health symptoms, the Studies of Online Cohorts of Internalizing Symptoms and Language (SOCIAL). From this, 804 timelines were harvested, and after removing missing data and bots, our final sample was 537 respondents who posted 999,859 tweets. This is a longitudinal, multi-method design, using surveys and text-based analysis of social media timelines. We used bootstrap resampling to compare differences in CDS prevalence in anxious and depressed participants.
Results
CDS can be observed in anxiety disorders, significantly increase as a function of anxiety symptom severity, and are related to depression and anxiety comorbidity.
Conclusions
Using behavioral, affective, and cognitive indicators of distorted thinking from social media may yield new insight into the trajectories of depression and anxiety. This work has implications for the future of CT/CBT and other online interventions that target distorted thinking styles.
Opmerkingen
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Introduction
Anxiety and mood disorders, also known as internalizing disorders, are among the most common mental disorders and leading contributors to the burden of disability worldwide (Greenberg et al., 1999, 2015; Konnopka & K¨onig, 2020). Internalizing disorders are heterogeneous in their symptoms (Fried et al., 2020; Lorenzo-Luaces et al., 2020, 2021), prognosis (Lorenzo-Luaces, 2015), and impact multiple bodily systems including circadian cycles (Ten Thij et al., 2020), emotion processing (Rutter et al., 2022; Rutter, Norton et al., 2020; Rutter, Passell, Rutter et al., 2019, 2020a, b, 2024), hormones (Granger et al., 1994; Young & Korszun, 2010), physical health (Berkman et al., 1986; Rutter et al., 2021), and even language (Bathina et al., 2021; Bollen et al., 2021). Regarding language, prior work has shown that individuals with symptoms of depression more frequently use first-person pronouns, absolutist language, and terms that describe negative emotions (Al-Mosaiwi & Johnstone, 2018; Bernard et al., 2016; Rude et al., 2004; Tackman et al., 2019). These findings are consistent with research suggesting that internalizing disorders are characterized by a self-focus and negative affectivity. Individuals with a diagnosis of depression also display more distorted negative thinking in real life (Beck, 1963), and in their online communications (Bathina et al., 2021).
Aspects of distorted thinking manifest by distinctive word choices and sentence structure are often targeted directly in therapy. Indeed, one of the leading evidence-based treatment for internalizing disorders is cognitive behavioral therapy (CBT), which aims to restructure negative thinking and change associated behaviors (Lorenzo-Luaces et al., 2015, 2016). Investigating the language of individuals with internalizing disorders may assist in understanding this heterogeneous and complex disorder. A better understanding of language indicative of depression and related disorders could assist in early detection or more individualized treatments. The relationship between distorted thinking in anxiety and depression and the ways in which these distorted thinking styles contribute to and perpetuate symptoms is well-established (e.g., Beck & Clark, 1988; Eysenck et al., 2007). Cognitive distortions are central to creating and maintaining symptoms across internalizing disorders. The novelty of our approach is using social media data to explore cognitively distorted language. Social media provides a platform where researchers can learn more about the language of depression and anxiety, as it naturally occurs. In the United States, over 70% of individuals are on a social media platform. The insights gained from social media data can have direct impacts on treatment of internalizing disorders through the use of machine learning and AI tools that can detect depressive thinking styles and could be used for the purposes of prevention and early intervention (Babu & Kanaga, 2022; Deshpande & Rao, 2017). This work has opportunities to contribute to the growing field of large language models (LLMs) in mental health (Lawrence et al., 2024). Moreover, understanding vulnerability to internalizing disorder is important because social media itself may be associated with internalizing symptoms (Rutter et al., 2021).
Grounded in cognitive therapy and Beck’s cognitive theory of depression, we have previously demonstrated that cognitive distortion schemata (CDS), or the patterns of thought represented by sequences of words (n-grams), differ between a depressed sample and a random sample on Twitter (Bathina et al., 2021). We created the CDS among a team of clinical psychologists and experts in natural language processing (NLP). We employed a theory-driven approach in creating the CDS, derived to measure the linguistic markers that indicate cognitive vulnerability to depression. The CDS categories we developed are 12 widely accepted styles of distorted thinking, including all-or-nothing thinking, catastrophizing, fortune telling, and others. Previous results that show CDS are higher in depressed individuals compared to a random sample of adults while controlling for other elements of depressogenic language including first-person pronouns and more negative language (Holtzman et al., 2017; Tackman et al., 2019). In other words, our findings could not be explained by higher use of ”I/me/my” or more words like ”sad/lonely/depressed.” Rather, differences can be more appropriately explained by language structure and patterns themselves. Indeed, we have also found evidence that this kind of language may be increasing over time (Bollen et al., 2021), which connects with concerns about the negative effects of social media.
While we have previously shown higher rates of CDS in depressed Twitter users compared to a random sample (Bathina et al., 2021; Bollen et al., 2021), we have not explored cognitive distortions online among anxious people. There is some recent evidence that natural language sentiment is an indicator of depression and anxiety symptoms (Ka´zmierczak et al., 2024), but this work is not informed by cognitive theories of depression or cognitive therapy (CT). Ka´zmierczak and colleagues’ recent work uses positively (e.g., beautiful) and negatively (e.g., horrible) valenced words and match these to severity of depression and anxiety symptoms, while our work developed a lexicon based on Beck’s cognitive distortions. While valence is useful to better understand the connections between word use and symptoms, the general lack of theory-driven language processing work is a major gap in the literature. A better understanding of naturalistic (casual, online) cognitively distorted language could provide better characterizations of internalizing disorders, which could be helpful in more precise treatment targets given the high comorbidity of anxiety and depressive disorders (Lorenzo-Luaces et al., 2022; Rutter & Brown, 2015; Rutter et al., 2023). Moreover, while we have previously examined prevalence rates of CDS, we have not examined if their use is related to symptom severity on a continuum which comports with a modern understanding of depression and anxiety.
Social media data provides a unique and naturalistic way of understanding language and its relationship to psychopathology. Rather than using language transcripts from therapy or coding language for content and themes, big data approaches allow for better understanding of language use and thinking styles at the population-level. A better characterization of cognitively distorted language online can help inform large language models for mental health, which could be ultimately used to enhance interventions. We would expect that more severe symptoms of anxiety and depressive psychopathology is associated with greater use of cognitive distortions. Thus, the current study was centered on two main research questions. First, does CDS prevalence increase as a function of anxiety severity (RQ1)? Second, how is CDS prevalence related to anxiety and depression comorbidity (RQ2)? For both of these research questions, we proposed a hypothesis. First, we expected that the prevalence of cognitive distortions in online language (calculated based on proportion of tweets containing CDS n-grams/total tweets) could be observed in individuals with anxiety disorders. and we hypothesized that CDS prevalence would increase as anxiety symptom severity increased. Second, we hypothesized that we would observe a relationship between depression severity and CDS prevalence, with higher levels of depression and anxiety associated with the highest proportion of distorted thinking in online language.
Method
This study was approved by the institutional review board of Indiana University Bloomington (2002549202 and 2005948214). We aimed to recruit approximately 1000 online participants who were Twitter users between July 2020 and March 2021.
Participants
We recruited participants via Qualtrics panels for a study on “social media and mental health” between July 2020 and March 2021. Participants were purposefully selected to represent the United Stated demographic distributions of age, gender, race, and ethnicity and received financial compensation for their participation. Participants were part of the Survey Online Cohorts for Internalizing Symptoms and Language (SOCIAL), specifically SOCIAL-I, described in prior work (Lorenzo-Luaces et al., 2022; Rutter et al., 2023). Participants (\(\:N=\text{3,472}\)) were asked to provide their Twitter handle in addition to answering a variety of self-report questions addressing mental health. From the provided Twitter handles (\(\:N=\text{1,377}\)), personal Twitter timelines were harvested (\(\:N=804\)) and evaluated for being human vs. bot-like (see Lorenzo-Luaces et al., 2023 for more detailed analysis). Participants were removed from our analysis if they did not fully complete the survey or if we were not able to retrieve a unique Twitter timeline based on the provided Twitter handle for that participant (\(\:N=691\)). Finally, we ran a bot-analysis using Botometer (Sayyadiharikandeh et al., 2020) on each of these profiles and removed the accounts that were deemed very likely to be automated. Based on these criteria, our final sample of individuals consists of 537 respondents, who posted 999,859 tweets in their timelines. Demographic information is included in Table 1.
Measures
Severity Measure for Generalized Anxiety Disorder
The severity measure for generalized anxiety disorder (GAD) consists of 10 items rated on a 5-point Likert scale ranging from 0 (“never”) to 4 (“all of the time”). The participants were asked to rate the frequency of worry and associated symptoms over the past 7 days. Total scores range from 0 to 40. Higher scores indicate a greater severity of GAD symptoms. The average score (total raw score/number of items answered) can be used as a proxy for GAD severity: 0 indicates none, 1 indicates mild, 2 indicates moderate, 3 indicates severe, and 4 indicates extreme anxiety. In our sample, Cronbach’s alpha was 0.92 for SOCIAL-I indicating excellent internal consistency.
Severity Measure for Depression
The severity measure for depression is also known as the Patient Health Questionnaire-9 (PHQ-9) (Kroenke et al., 2001) and consists of 9 items rated on a 4-point Likert scale. Items are rated based on symptoms over the last 7 days, and range from 0 (“not at all”) to 3 (“nearly every day”). The PHQ-9 has excellent psychometric properties. We scored the PHQ-9 to align with the severity scores of the GAD-10, described above, and based on established scoring. PHQ-9 scores correspond with an aggregate severity measure as follows: 0–4: none (0), 5–9: mild (1), 10–14: moderate (2), 15–19: severe (3), and 20–27: extreme (4). In our sample, Cronbach’s alpha was 0.87 for SOCIAL-I suggesting that this measure was internally consistent.
Cognitive Distortion Schemata
Beck proposed the concept of cognitive distortions to characterize the thinking of individuals with depression (Beck, 1963). We drew on these latest lists of cognitive distortions, which consist of 12 cognitive distortions categories. We (L.A.R., L.L-L., and J.B.) iteratively designed a list of 241 CDS n-grams in consultation with CBT experts. Each CDS n-gram is each geared to express at least one type of cognitive distortion. The CDS were formulated to capture the minimal semantic building blocks of distorted thinking while avoiding expressions that are specific to depression-related topics, such as poor sleep or ongoing health issues. Where possible, higher-order n-grams were chosen to capture as much of the semantic structure of one or more distorted schemata as possible (Bathina et al., 2021). For example, the 3-gram ‘everyone will believe’ captures both ‘overgeneralizing’ and ‘mindreading’. For more details on CDS construction, as well as a details on CDS n-grams and relevant grammatical features, see Bathina et al., 2021. While our initial work examined CDS across the 12 categories, the current project collapses all CDS categories into one, representing all types of distorted thinking. We did this for ease of analysis, interpretation, and because we did not make conceptual distinctions between the categories based on our hypotheses. To collapse CDS categories into one metric of CDS, we simply calculate the proportion of all tweets that contain any occurrence of a CDS n-gram for each participant in our study as a measure of CDS prevalence.
Data Analysis
We report descriptive statistics for the demographic variables of participants in SOCIAL-I. Then, we used bootstrap resampling to compare differences in CDS prevalence by anxiety and depression severity. We calculate the between-group CDS prevalence as the proportion of CDS-containing tweets produced by the subset of the re-sampled group with a given severity score. In other words, for each severity class C we have a corresponding set of tweets TC, the set of all tweets produced by the individuals in C. Our CDS n-gram schemata is a function F(t) → {0,1} that maps each tweet t to 1 if it contains any CDS n-gram or 0 otherwise. The prevalence for each severity class C is thus calculated as. To establish accuracy of these prevalence calculations, we apply bootstrap re-sampling. Our bootstrap analysis comprised randomly re-sampling with replacement n individuals from our sample population, where n is the size of our sample. This bootstrap estimates are calculated by resampling repeated B = 10,000 times, and the CDS prevalence recorded at each step. Our bootstrap analysis thus produces a distribution of CDS prevalence estimates such that overlap of inner 95 percentile interval of the distribution of prevalence estimates for one class with the median of another class is considered to be indicative of non-significant differences between severity groups. As noted above, there were 5 severity groups for GAD (none, mild, moderate, severe, extreme) for which we bootstrap the CDS prevalence across the individuals in the severity group (Fig. 1). However, in our data sample the extreme severity class comprised only 16 individuals. This sample size was too small to be considered valid for bootstrap analysis and so was excluded from the final analysis. We followed this same process to examine depression severity and CDS prevalence, again using bootstrapping and examining 95% confidence interval overlaps (Fig. 2).
To further test the relationships between variables, we next conducted Spearman rank-order correlation between anxiety, depression, age, sex, and CDS prevalence (Fig. 3a). This correlation measure was selected due to concerns with respect to the underlying assumptions of aggregate Likert-scale questionnaire scores as linear measures of symptom severity. Rank-order correlation preserves ordering of severity without relying on the assumption that such questionnaire measures are linear metrics of underlying psychological traits (Liddell & Kruschke, 2018). We further examine the relationship between comorbidity of anxiety and depression with CDS prevalence by computing Spearman rank-order correlations over the shared and unique variance (Hoza et al., 2005) between anxiety and depression (Fig. 3ab). Here, we calculate the shared variance between GAD-10 score and PHQ-9 score as \({{{z_d} - {z_a}} \over 2}\) and unique variance as \({{{z_d} - {z_a}} \over 2}\), where zd is the z-scored PHQ-9 score and za is the z-scored GAD-10 score. These values thus account for the shared and marginal effects of depression and anxiety.
Note 10 participants did not respond to the demographic portion of the survey, so total for each demographic category is 527 participants of total 537
We also include Spearman rank-order correlation as shown in Fig. 3. Rank correlation demonstrates statistically significant relationships across all variables, although the effects are small, as has previously been reported when correlating NLP metrics with self-report (Holtzman et al., 2017). Anxiety (GAD-10) and depression (PHQ-9) scores were highly correlated with a correlation coefficient of 0.76, closely reflecting comorbidity rates between anxiety and depression. These severity scores were also correlated with CDS prevalence at a rate of 0.085 and 0.079, respectively. These correlation values are consistent with prior studies applying NLP to self-reported mental health data (DeSouza et al., 2021; Milintsevich et al., 2023).
To test our first hypothesis that CDS was related to anxiety symptom severity, we conducted regression. As shown in Fig. 1, there is a relationship between anxiety and online cognitive distortions which can be observed by increased prevalence of CDS, with individuals with no/minimal anxiety having the lowest CDS prevalence. There is an apparent increasing trend of CDS prevalence rising with GAD severity class, with moderate and severe GAD scores showing significantly greater prevalence than none-minimal and mild GAD. However, differences of none-minimal versus mild or moderate versus severe do not reach statistical significance. In general, we found support for our hypothesis, showing that CDS prevalence increases with increases in anxiety severity.
Fig.1
Bootstrapped aggregate CDS prevalence distributions for each GAD-10 severity class. Note Each bar above with the sample size of the corresponding severity class. The colored box represents interquartile range, while the horizontal lines correspond to 95% CI. Our results show a trend of increasing CDS prevalence as severity increases, with pairwise significant differences denoted by braces
×
To test RQ2, we examined the relationship between CDS prevalence and depression severity. In examining CDS prevalence and depression severity, we see a linear increase in CDS prevalence by severity (Fig. 2). However, there is no significant difference between extreme levels of depression and minimal levels of depression. We attribute this to the lower number of individuals that we could include in these categories, as the bootstrapped intervals are the widest for this category.
Fig. 2
Bootstrapped aggregate CDS prevalence distributions for each PHQ-9 severity class. Note. Each bar is annotated above with the median and 95% CI bounds in brackets. The colored box represents interquartile range, while the horizontal lines correspond to 95% CI. Pairwise significant differences denoted by braces
×
Overall, our results supported our hypotheses. First, CDS increase as anxiety severity increases. Support of our second hypothesis that GAD and depression comorbidity would produce high levels of CDS is more mixed. As our rank-order correlation analysis shows a high correlation between GAD-10 and PHQ-9 scores (see Fig. 3), our cohort unfortunately does not lend itself to comorbidity analyses due to a low number of observations of individuals that score high on one of the questionnaires and low on the other. However, adding support to our prior work, we see CDS increase as depression severity increases for minimal through severe levels of depression (see Fig. 2).
Fig. 3
Pairwise Spearman rank-order correlation coefficients between (a) GAD10, PHQ9 and confounding variables and (b) accounting for shared and unique variance between PHQ9 and GAD10. Note. Significance of results denoted below the coefficient value by **: p < 0.01 and *: p < 0.05
×
Discussion
To the best of our knowledge, this is the first study to examine cognitive distortions in online language in mood and anxiety disorders using natural language processing (NLP) and a theory-based lexicon of n-grams. While some prior work using natural language processing has shown differences in language and thinking styles in depressed vs. random cohorts, less focus has been given to anxiety. This is a critical gap in our understanding of the thinking styles across internalizing disorders, especially considering that anxiety disorders increase the risk for mood disorders (Black et al., 2010; Duffy et al., 2013; Hong & Cheung, 2015), and that anxiety disorders alone are associated with a very high level of distress, impairment, and disability (Greenberg et al., 1999).
Our primary finding was that CDS increase as levels of generalized anxiety symptoms increase. We observed a similar trend with major depression symptoms as based on the PHQ-9: as symptoms of depression increased, prevalence of distorted thinking increased, controlling for age and sex covariates. While shared variance of anxiety and depression symptoms contribute to overall CDS prevalence, the size of the effect is slightly higher than unique variance, according to Spearman rank-order correlations (0.089 = shared vs. 0.041 = unique). Regarding the size of the effects we observed, there are many reasons why variance in CDS prevalence that is explained by severity is expected to be low. For example, there are many other variables that impact expression of CDS including time of year and the content of the post. Indeed, in a recent study examining emotion connectivity and symptoms of depression, sizes of the effect were similarly small (Kelley et al., 2023).
Based on the CDS construction, which was informed by Beck’s original cognitive distortion categories (Beck, 1963), and expanded by 10 CBT experts in our team (Bathina et al., 2021), we provide validation of distorted thinking in anxiety disorders in online language. This work sheds additional light on the degree to which cognitively distorted and depressogenic language occurs colloquially in social media platforms. Rather than developing a classifier to detect anxiety and depression online, which is already being done by other research groups, our work seeks to better characterize distorted thinking online, naturalistically. Our CDS development and subsequent work based in CT is theory-driven, as opposed to purely data-driven NLP approaches. This work can be used to better understand how language is used in vulnerable individuals within social networks. Moreover, our work can be used in tandem with emerging work using LLMs applied to support mental health (Lawrence et al., 2024), and the growing body of literature applying NLP and machine learning to mental health (Le Glaz et al., 2021). There is the potential for largescale societal relevance of our work: as individuals connect across the globe, they may use distorted language in specific ways within their social networks (Edinger et al., 2023; Tong et al., 2018). There may be elements of contagion (Al-Mosaiwi & Johnstone, 2018; Lee & Theokary, 2021) of distorted language within certain communities with depression and anxiety disorders, but this remains to be tested.
Despite the novelty of our research, using a theory-driven approach to capture distorted language, there are several limitations to consider. First, we relied on self-reported depression and anxiety symptoms. There was no verification of symptoms by a clinician. While this is an expansion of our prior work that relied on self-disclosure of diagnoses, there are still problems with self-report (Rutter et al., 2023), as compared to clinician-based severity ratings. Clinical diagnoses of GAD and depression were not confirmed in our sample. Our measure of GAD was from the emerging measures list for DSM-5, and is not the gold standard for GAD like the GAD-7. Second, our sample was drawn from an online Qualtrics panel, and data may be limited by low-incentive to complete the research accurately (Douglas et al., 2023). Third, we had low proportions of individuals with extreme levels of depression and anxiety symptoms. While this is to be expected, the bootstrapping produced large confidence intervals, which limited interpretation of CDS at extreme levels of symptoms. A final limitation is that our CDS lexicon has not been tested against human coders of the Twitter messages. Human annotation and extension with additional examples constructed by LLMs is a next logical step of this work.
This work has implications for the future of evidence-based treatments including online interventions that engage individuals with internalizing symptoms, but much more work is needed to determine the ways that NLP/AI can meaningfully impact treatment. Characterizing the relationship between distorted language and depression and anxiety symptoms may help in the development of automated interventions such as chatbots, see(Ahmed et al., 2023) for a recent review. Moreover, the extent to which CDS prevalence occurs in the population at large can be used as a passive index of vulnerability to depression and anxiety disorders. This vulnerability index is expected to change as individuals progress through treatment, or go through major stressors, such as a global pandemic, but this is yet to be explored. Using online language to distinguish trajectories of symptoms in individuals with internalizing symptoms is still a relatively novel research area (Stade et al., 2023, 2024). More work is needed to understand cognitive distortions in colloquial language use (Bollen et al., 2021) and how these distortions can change with age, symptoms, context, and treatment.
Declarations
Competing Interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
share
DELEN
Deel dit onderdeel of sectie (kopieer de link)
Optie A:
Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
Met BSL Psychologie Totaal blijf je als professional steeds op de hoogte van de nieuwste ontwikkelingen binnen jouw vak. Met het online abonnement heb je toegang tot een groot aantal boeken, protocollen, vaktijdschriften en e-learnings op het gebied van psychologie en psychiatrie. Zo kun je op je gemak en wanneer het jou het beste uitkomt verdiepen in jouw vakgebied.