Systematic Review With Meta-analysis

Coffee Consumption and the Risk of Cirrhosis

O. J. Kennedy; P. Roderick; R. Buchanan; J. A. Fallowfield; P. C. Hayes; J. Parkes


Aliment Pharmacol Ther. 2016;43(5):562-574. 

In This Article


We followed the Prisma guidelines; a protocol is shown in Table S1

Study Searches and Selection

We searched for titles of articles in PubMed, Embase (using Ovid) and Web of Science using the term: (coffee OR caffeine) AND (*liver* OR *hepat* OR *cirrh* OR *fibro*). We performed the search in July 2015 and did not restrict publication date. We also performed manual searches of reference lists of relevant studies returned by the initial search.

We included studies in this meta-analysis if they: (i) involved a case–control study, cohort study or RCT; and (ii) provided relative risks (RRs), odds ratios (ORs) or hazard ratios (HRs), including 95% confidence intervals (95% CIs), for cirrhosis stratified by coffee consumption in adults aged 18 or older. We excluded studies if they did not provide a summary dose–effect size or allow one to be calculated, which required individual effect sizes for three or more consumption categories. We also excluded studies if they were not in English. We assumed a diagnosis of cirrhosis where studies reported hospitalisation for chronic liver disease (CLD) or death from CLD but without a diagnosis of HCC. If two or more studies reported the same data, we used the most recent study. OJK screened titles and abstracts to remove duplicates. OJK and RB independently reviewed the remaining studies.

Data Extraction and Quality Assessment

The data extraction was performed by OJK and checked by RB. The following data were extracted from each study in a standardised manner: (i) the publication date, the first author's surname and the country of origin; (ii) the study characteristics, including the design, the inclusion and exclusion criteria, the sample size, the measurement of coffee consumption, the outcome measures, and the adjustments for confounding variables and (iii) the number of events (or cases) and non-events (or controls) and the corresponding effect size and 95% CIs for different categories of coffee consumption. For cohorts, we also extracted information concerning whether CLD was excluded at baseline, the follow-up time and the loss to follow-up. Where studies provided multiple effect sizes for a single category of coffee consumption, we extracted the effect size most comprehensively adjusted for confounders. Where studies reported effect sizes for caffeinated and decaffeinated coffee consumption separately, but not total coffee consumption, we extracted the effect sizes for caffeinated coffee. The case–control studies reported ORs, while the cohort studies reported either RRs[9,12,13] or HRs.[8,14] As the incidence rate of cirrhosis was low, we assumed the ORs, RRs, HRs were equivalent, and from herein we refer to all three as RR for simplicity. We worked form published data only and without contacting study authors.

We assessed the risk of bias in individual studies using the Cochrane Risk Of Bias Assessment Tool: for Non-Randomized Studies of Interventions (ACROBAT-NRSI),[15] as has been used previously.[16] We included the following domains of bias: confounding, selection, measurement of exposure at baseline, changing exposure during follow-up, missing data (including loss to follow-up), outcome measurement and selective reporting. In accordance with the Cochrane tool, we judged each domain of bias as 'low', 'high' or 'unknown' risk. We made a single judgement for the risk of bias from 'measurement of exposure', which combined the domains 'measurement of exposure at baseline' and 'changing exposure during follow-up'. We made an overall judgement of the risk of bias for each study. We judged there to be a 'high' overall risk of bias where there was plausibility that individual domain bias would lead to bias in the reported effect estimates. We determined the overall quality of evidence supporting the effect of coffee on cirrhosis using the Grading of Recommendations Assessment, Development and Evaluation (GRADE).[17] OJK and PJR performed the risk of bias analysis and overall quality of evidence assessment separately and then discussed the results for consensus.

Statistical Analysis

Eight studies reported RRs and 95% CIs for cirrhosis. The other, Klatsky et al., reported RRs and 95% CIs for alcoholic and non-alcoholic cirrhosis separately, but not total cirrhosis. For this study, we calculated a RR and 95% CI for total cirrhosis using the method described by Hamling et al.[18]

We consider the RRs as reported in the different studies below. However, because the reported categories of coffee consumption varied between the studies, a direct comparison was not initially possible. Thus, we calculated for each study a summary RR and 95% CI for an increase in coffee consumption of two cups per day. For each study, this involved estimating the median coffee consumption in each of the reported categories. Where the consumption category was an integer (e.g. one cup per day), we used the integer as the median. Where the category was a closed range, (e.g. one to three cups per day), we used the mid-point as the median. For the highest ranges, which were open-ended (e.g. >two cups per day), we used the lower end of the range plus the width of the preceding closed range for the median. If there was no preceding closed range, we used the lower end of the open-ended range plus the difference between the two preceding integers. This method was similar to those used for estimating median exposure in ranges in other meta-analyses.[19,20] After calculating median consumptions, we performed a summary dose–response analysis following the method of Greenland and Longnecker.[21] We tested for nonlinearity of the dose–response across the range of consumption reported in the studies (from 0 to four and above cups per day) using a restricted cubic spline model.[22] This used data from eight studies that provided RRs for different categories of coffee consumption (Tverdal and Skurtveit did not report category-specific RRs). The P-value for nonlinearity was 0.34. We also used the cubic spline model to calculate RRs of cirrhosis for one to four cups per day compared to none.

Using the RRs and 95% CIs for an increase of two cups of coffee per day, we calculated a pooled RR and 95% CI of cirrhosis. We used a random effects model to incorporate between study heterogeneity, assuming the biological effects of coffee in different populations would vary randomly, at least by type, processing and measurement of coffee.[23] We examined statistical heterogeneity by performing Cochran's Q and I2 tests. In accordance with the Cochrane Handbook,[24] Chapter 9.5.2, we used a P-value of <0.1 to signify statistically significant heterogeneity and we interpreted the I2 values as follows: '0–40% heterogeneity might not be important; 30–60% may represent moderate heterogeneity; 50–90% may represent substantial heterogeneity; 75–100%: considerable heterogeneity'. We also examined heterogeneity by performing a sensitivity analysis, in which we calculated pooled RRs and 95% CIs while excluding studies one at a time from the analysis.[25] To examine potential publication bias, we used Egger's regression test. We did not test funnel plot symmetry to assess for publication bias due to the low power of that test when less than ten studies are available.[26] We performed sub-analyses to calculate the pooled RRs for cohort studies and case–control studies separately, the RR of alcoholic cirrhosis and the RR of death (i.e. with a diagnosis of cirrhosis or CLD). In order to assess confounding and the direction and magnitude of overall adjustment, we meta-analysed the crude effect sizes and compared them with the adjusted values. For this purpose, we used the reported crude effect sizes or, where not reported, we calculated crude effect sizes from the published data. We used stata (Release 13, StataCorp LP, College Station, TX) and Mathematica (Version 10, Wolfram Research, Inc., Champaign, IL) to perform the analyses, and we used a two-sided P > 0.05 for statistical significance.