Review Article

Shared Disease Mechanisms Between Nonalcoholic Fatty Liver Disease and Metabolic Syndrome

Translating Knowledge From Systems Biology to the Bedside

Silvia Sookoian; Carlos J. Pirola


Aliment Pharmacol Ther. 2019;49(5):516-527. 

In This Article

Materials and Methods

Search Strategy

The list of genes (human protein-coding genes) for each phenotype of interest was generated by searching the Genie web server ( This resource performs literature-enrichment analysis by computing associations of genes with keywords using biomedical literature annotations. Genie algorithm takes a biological topic as input, evaluates the entire MEDLINE for relevance to that subject, and then evaluates all the genes for the specified organism (Homo sapiens) according to the relevance of associated MEDLINE records.[14] The data are sourced from both NCBI (eg Entrez Gene) and NLM databases (eg PubMed and MeSH). Specifically, genes are identified by official gene symbols or Entrez Gene IDs, whereas diseases are identified by MeSH-C (Medical Subject Headings-diseases) terms. Disease-related citations are retrieved from annotations in PubMed and gene-related citations are retrieved from the Entrez Gene database. The literature search included all studies published before October 2018.

Briefly, the Genie server first retrieves a sample set of abstracts (N = 1000) that are representative of the keyword/s used in the search. Then, the training set serves as a statistical model for performing further record selection in MEDLINE; all abstracts associated with the genes in the gene database of NCBI are evaluated by Bayesian statistics and are assigned a P-value representing the confidence of the classification.[14] The final list of genes is sorted by the presented false discovery rate (FDR) computed by applying the Benjamini-Hochberg method (FDR < 0.05).

We used the following list of keywords to perform the search in the Genie server: non-alcoholic fatty liver disease, diabetes mellitus, dyslipidemia, hypertension, obesity, inflammation and fibrosis. The search within the Genie server included annotation of MeSH descriptors by automatically using the MeSH browser of the US National Library of Medicine (

Enrichment Analysis

To integrate the list of genes/proteins associated with the diseases and phenotypes of interest into biological pathways, we performed functional enrichment analysis using the FunRich resource available at Gene enrichment analysis for biological processes was based on annotations in the Gene Ontology database (; Bonferroni and Benjamini-Hochberg (BH), also known as FDR (false discovery rate), method is implemented by the FunRich resource to correct for multiple testing.

PANTHER resource ( was used to assess reactome pathways.

The connectivity network among genes of interest was generated using the STRING website (, a database of known and predicted protein-protein interactions.

Analysis of Pleiotropy

To investigate the level of pleiotropy in the list of shared genes/transcription factors associated with NAFLD, the components of the Metabolic syndrome, and inflammation and fibrosis, we used the ToppCluster tool available at ToppCluster performs gene functional enrichment analyses of associations between the input gene list and "human diseases" based on information contained in the following datasets: Clinical Variations, DisGeNET BeFree, DisGeNET Curated, GWAS catalog and OMIN (Online Mendelian Inheritance in Man) as specified in the ToppCluster pipeline.[15]

Functional enrichment analysis of related clinical phenotypes was also performed by the FunRich tool that uses the OMIN database to estimate the percentage of homology between an input list of genes and the information stored in the entire OMIN database.

Gene-Drug Interaction Network

Analysis of gene and drug/chemical interactions was performed using the ToppCluster resource.[15] Drug annotations in ToppCluster are based on the information retrieved from Broad Institute CMAP Down, CTD (Comparative Toxicogenomics Database), Drug Bank and Stitch.

Integration of Functional OMIC Data

Analysis of gene (mRNA) expression levels in the liver tissue was conducted by retrieving data from The Genotype-Tissue Expression (GTEx) Consortium (Data source: GTEx Analysis Release V7). Global protein expression levels were explored using the FunRich based on the information retrieved from the UniProt database.

A summary of our work strategy is shown in Figure S1.

Supplementary Figure S1.

Summary of the work strategy