A Twitter Discourse Analysis of Negative Feelings and Stigma Related to NAFLD, NASH and Obesity

Jeffrey V. Lazarus; Christine Kakalou; Adam Palayew; Christina Karamanidou; Christos Maramis; Pantelis Natsiavas; Camila A Picchio; Marcela Villota-Rivas; Shira Zelber-Sagi; Patrizia Carrieri


Liver International. 2021;41(10):2295-2307. 

In This Article


Phase I: NAFLD and NASH Results of Searches

Overall, 18 275 tweets were retrieved globally for NAFLD and 2621 for NASH. After removal of non-English tweets, the final datasets consisted of 16 835 tweets for NAFLD and 2376 tweets for NASH. The majority of the English tweets collected were retweets: 9235 (54.9%) for NAFLD and 1366 (57.5%) for NASH.

Geographic Distribution

There were 11 777 (70.0%) tweets for NAFLD and 1796 (75.6%) for NASH which contained geographic information (self-reported bio-location). The majority of the content generated on Twitter for NAFLD and NASH was disproportionately found in North America (53.2%) and the European Union (26.0%; Figure 2). For the NAFLD dataset, there was a large amount of content generated from India (4.7% of total tweets and 41.9% of tweets in Asia), specific locations in South America (2.8% of total tweets, mainly from Brazil and Chile), Africa (Nigeria with 1.8% of total tweets and 40.1% of tweets in Africa and South Africa with 1.1% of total tweets and 24.3% of tweets in Africa) and Oceania (Australia with 2.0% of total tweets and 76.2% of tweets in Oceania).

Figure 2.

Geographic distribution of: (A) NAFLD and (B) NASH tweets. NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis

Content Analysis

To explore the content of the tweets on NAFLD and NASH and to obtain a representative idea of the content on NAFLD and NASH on Twitter, a randomly selected sample was selected and annotated. We manually labelled 1130 NAFLD and 535 NASH tweets. Among the NAFLD tweets, we found that 245 shared some sort of unverified information compared to 75 who shared scientific studies. Additionally, 95 of the NAFLD tweets referenced other diseases, while 97 referenced obesity and 29 NASH. Furthermore, out of the NAFLD tweets, 14 contained an indication that they were from a medical professional; 13 contained a reference to stigma and 3 were indicative of being from a society or NGO. Among the 535 NASH tweets, 274 contained unverified information, 160 contained references to NAFLD, 183 contained references to scientific studies, 106 to other diseases and 63 to obesity; 36 were indicative of coming from a medical professional and 25 from a society or NGO; and 5 contained stigmatizing terms. Examples of the different types of tweets for each label are found below (Table 1).

Obesity terms detected in the datasets included fat, obese, overweight, heavy and big for NAFLD and fat, obese, overweight, big and large for NASH. Stigmatizing terms included gluttonous, gross, flabby, inconvenient and lazy. An example of a stigmatizing tweet addressed to another Twitter user is as follows: 'Keep eating fat boy.. you know what you are doing don't act like you done and we won't cry for you either when your fatty liver finally diesare you stuffing your emotions down with Big Macs? Awe #cantdealwithlifewithouticecream'. The full list of the terms is available in the appendices (Appendix 1).

Hashtag Analysis

After the analysis of tweets collected during Phase I, 6486 (38.5%) NAFLD and 1399 (58.9%) NASH tweets contained one or more hashtags, with the most common ones being liver health related. For the accounts that were identified as medical professionals, the hashtags that were used matched closely with those in the hashtag analysis (Figure 3). However, there were key hashtags, such as #pathology, that were specific to medical users and not widely used by the public. Additionally, for NGOs and societies, the number of hashtags was too low to analyse, reflecting the small number of tweets that came from these organizations, which is a result in itself.

Figure 3.

Most frequent hashtags for: (A) NAFLD and (B) NASH. NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis

URLs and Domain Evaluation

For NAFLD, 12261 (78.8%) tweets and for NASH, 2054 (86.5%) tweets contained one or more URLs. For each domain, we calculated the log of the follower count from all of the accounts, sharing a link from it as an indication of the reach of this account (Figure 4). We examined 394 URLs and found that for NAFLD there were 40.5% (n = 123) extremely untrustworthy sources; 19.7% (n = 60) not trustworthy sources; 9.3% (n = 28) slightly trustworthy sources; 17.8% (n = 54) quite trustworthy sources and 12.8% (n = 39) very trustworthy sources (Appendix 3).

Figure 4.

Breakdown of the trustworthiness of domains in the tweets as rated by researchers for the: (A) NAFLD and (B) NASH datasets. The centre of the red squares represents the mean for each group and the centre of the blue square represents the median for each group. NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis

Phase II: Obesity Results of Searches, Content and Sentiment Analysis and Geographical Distribution

Additional data were collected from 19 November 2019 to 5 December 2019, with over 10 million (10 052 845) tweets being collected, of which 9 176 806 (91.3%) of them were in English. For the obesity dataset, 5444 randomly selected tweets were classified as positive, neutral, negative or irrelevant by two of the researchers. When applying the self-developed NLP data processing pipeline, these tweets were classified for their relevance and sentiment polarity, with 2542 (46.6%) being classified as relevant.

This machine learning (ML)/NLP processing software was used on 193 747 non-annotated tweets, randomly selected out of the original dataset of over 10 million tweets and the custom sentiment analysis pipeline found 77 505 relevant tweets (40.0%). Subsequently, out of the relevant tweets, 85.2% were classified as negative, 1.0% as positive and 13.8% as neutral. Using the sentiment analysis tools described in the methods, the default setting classified 53.1% of the relevant tweets as negative, 12.0% as neutral and 35.0% as positive. In comparison, the slang setting of the analysis classified 89.6% of tweets as negative, 6.5% as neutral and 3.9% as positive. Based on the final annotated dataset, the self-developed ML program was then used and validated with 93.7% accuracy and 90.8% precision—the latter ensures that no irrelevant tweets were included into the dataset to be further explored in the cross-validation process performed on the annotated dataset. Regarding sentiment polarity, the custom sentiment analysis pipeline achieved an accuracy of 88.7% in the cross-validation process using the final dataset after annotation conflict resolution.

Furthermore, we examined the geographic breakdown of the tweets for each of the aforementioned different sentiment analysis techniques; the geographic distribution of the manually annotated tweets is presented below (Figure 5). The majority of positive and neutral sentiment tweets came from North America and Europe, whereas negative tweets came from almost every country.

Figure 5.

Locations of tweets with negative, neutral and positive polarity regarding obesity