A Twitter Discourse Analysis of Negative Feelings and Stigma Related to NAFLD, NASH and Obesity

Jeffrey V. Lazarus; Christine Kakalou; Adam Palayew; Christina Karamanidou; Christos Maramis; Pantelis Natsiavas; Camila A Picchio; Marcela Villota-Rivas; Shira Zelber-Sagi; Patrizia Carrieri


Liver International. 2021;41(10):2295-2307. 

In This Article


The applied process is outlined and detailed below (Figure 1).

Figure 1.

Workflow of the study. NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis

Data Gathering

Creation of the Search Terms Index. Firstly, we created a domain-specific sentiment lexicon of terms for NAFLD/NASH and obesity employing words commonly used to describe these conditions. For obesity, we reviewed and added relevant synonyms from WordNet, Thesaurus.com and Urban Dictionary, as well as related stigmatizing/discriminatory terms according to domain and lay experts. The built terms index can be found in the appendices (Appendix 1).

Data Collection. The data collection was piloted on 29 April 2019 to improve the lexicon used to form the Twitter search queries, which was ultimately based on the main terms related to NAFLD/NASH identified in Unified Medical Language System® terminology. Phase I of data collection started on 2 May 2019 collecting data for NAFLD/NASH. For reasons of lack of NAFLD/NASH tweets that intersected with obesity, additional data were collected from November to December 2019 (Phase II), where only keywords that were specific to obesity such as 'obese', 'fat' and 'overweight' were used for the selection of tweets. Ultimately, there were three datasets: one for NAFLD (Phase I), one for NASH (Phase I) and one for obesity (Phase II). At the time of the study, Twitter data were only available for real-time collection (ie streaming data) restricting any retrospective/historical collection of tweets.

Annotation of the Data

The DataTurks platform[17] was used to annotate specific fragments of interest (obesity, stigma, NASH, NAFLD, irrelevant fragment, irrelevant content, scientific study, information, society/non-governmental organization (NGO), medical professional, other disease, other) which were determined independently by three of the researchers. Definitions of the tags used can be found below (Box 1). The labels assigned to the tweets collected during Phase I were not mutually exclusive and multiple references to a label could be noted in a single tweet.

Tweets of the obesity dataset (Phase II) were annotated as positive, neutral or negative according to their inclusion of stigmatizing language. Classification of tweets was carried out independently by two of the researchers. A third researcher evaluated the annotations as either correct or incorrect, in which case the correct sentiment label (positive, neutral or negative) was suggested. A fourth researcher reviewed any conflicts between the initial annotations and the evaluation and resolved them with a final decision.

Data Analysis

The five-part data analysis consists of (see Appendices 2 and 3 for more details): (a) the geographic distribution of tweets; (b) the content breakdown of labelled tweets; (c) the hashtag analysis; (d) the domain and uniform resource locator (URL) and (e) the automatic sentiment polarity assessment via the self-developed NLP module. To this end, a custom sentiment polarity analysis pipeline was developed based on established NLP techniques.[18–22] The developed pipeline first addressed the relevance of a tweet (yes/no) and subsequently addressed the sentiment polarity (ie if it included stigmatizing language) of the relevant tweet (positive/neutral/negative).

Web Application for Monitoring and Data Collection Hosting

An interactive web application was deployed online, hosted by the Institute of Applied Biosciences (INAB), Greece. It provided an overview of collected data along with suitable visualization techniques for each aspect of the annotation and analysis, presented in different tabs. Both raw and processed data are presented as sortable tables. Additionally, the visualizations are live and clicking on data points shows the underlying information.

Data gathering and analyses relied on an in-house research platform developed and maintained by the Institute of Applied Biosciences, in the Centre for Research & Technology Hellas (INAB|CERTH), Thessaloniki, Greece, written by two of the researchers, with input from the Health Systems Research Team at the Barcelona Institute for Global Health (ISGlobal), University of Barcelona, Spain. The application containing the analysis and the raw data can be found at: https://osf.io/aw3p4/.