Comprehensive Assessment of Diet Quality and Risk of Precursors of Early-Onset Colorectal Cancer

Xiaobin Zheng, MD, PhD; Jinhee Hur, PhD; Long H. Nguyen, MD, MS; Jie Liu, MD; Mingyang Song, MD, ScD; Kana Wu, MD, PhD; Stephanie A. Smith-Warner, PhD; Shuji Ogino, MD, MS, PhD; Walter C. Willett, MD, DrPH; Andrew T. Chan, MD, MPH; Edward Giovannucci, MD, ScD; Yin Cao, MPH, ScD


J Natl Cancer Inst. 2021;113(5):543-552. 

In This Article


Study Population

The NHSII is a prospective cohort study of 116 430 US female nurses aged 25 to 42 years at enrollment in 1989. Participants were followed biennially with self-administered questionnaires on demographics, lifestyle factors, and medical diagnoses. Dietary intake was assessed every 4 years through mailed food frequency questionnaires (FFQs). Return of the completed questionnaire implied informed consent to participate in the study. Overall, the active follow-up rate was approximately 90%.[24]

In the current analysis, study baseline was set as 1991, the time of initial FFQ assessment. We excluded participants who had diagnoses of CRC, inflammatory bowel disease, or a previous history of colorectal polyps prior to baseline and each biennial follow-up cycle. After additional exclusions were made for those who had missing data on any of the exposures or reported implausible energy intake (<600 or >3500 kcal/d), 59 013 participants were identified to have undergone at least 1 lower endoscopy before 2011 (the end of follow-up). We further restricted to 29 474 women younger than age 50 years for our primary analyses (Supplementary Figure 1, available online). The study protocol was approved by the institutional review boards of the Brigham and Women's Hospital and Harvard T.H. Chan School of Public Health and those of participating state cancer registries as required.

Ascertainment of Colorectal Adenoma

On each biennial questionnaire, participants reported whether they had undergone lower endoscopy and the corresponding reason(s). Investigators masked to exposure information reviewed all the retrieved medical records and extracted data on anatomical location, size, histological type, and number of polyps. If more than 1 adenoma was diagnosed, size and histology were categorized by the largest and most advanced polyp, respectively. Cases and noncases were defined every 2 years and updated through the 2011 questionnaire cycle: all confirmed newly diagnosed adenomas (tubular, villous, tubulovillous, or with high-grade dysplasia) were considered as cases and individuals who had a lower endoscopy but reported no adenomas as noncases.

We further categorized adenomas according to their malignant potential.[25] High-risk adenomas were defined as adenomas with any of the following features: 1 cm or more in size, tubulovillous or villous histology, high-grade dysplasia, and the presence of 3 or more adenomas. Low-risk adenomas included all other adenomas. We defined advanced adenomas considering only size and histology.[26] Adenomas in the cecum, ascending colon, hepatic flexure, and transverse colon were classified as proximal adenomas; those in splenic flexure, descending colon, and sigmoid colon as distal colonic adenomas; and those in the rectum or rectosigmoid junction as rectal adenomas, respectively.[27]

Assessment of Diet Quality

Every 4 years since 1991, participants self-reported average food intake over the preceding year via validated semiquantitative FFQs.[28] Briefly, to capture food consumption frequency, 9 response options were provided, ranging from "never or less than once per month (referred to never)" to "6 or more times per day". Total nutrient intake was calculated as the sum of consumption frequency of each food item multiplied by the corresponding nutrient composition in the standard portion size.

Food items on the FFQ were categorized into 40 groups, and factor analysis was performed to derive 2 dominant dietary patterns: Western and prudent diet[29] for which the reproducibility and validity have been documented.[30]

To capture the adherence to major dietary recommendations, we derived Dietary Approaches to Stop Hypertension (DASH),[31] Alternative Mediterranean Diet (AMED),[32] and Alternative Healthy Eating Index-2010 (AHEI-2010).[33] The DASH score consisted of 8 components and ranged from 8 to 40. The AMED score consisted of 9 components and ranged from 0 to 9. The AHEI-2010 score consisted of 11 items and ranged from 0 to 110. Scoring methods and dietary components are provided in Supplementary Table 1 (available online). For the 3 indexes, a higher score reflects higher diet quality.

Statistical Analysis

We calculated the cumulative average of all dietary scores available from 1991 to the questionnaire cycle (2 years) prior to the most recent endoscopy to represent long-term intake reflecting true changes and reduce random within-person variation by increasing the number of measurements.[34] As primary analyses, we first investigated the associations between diet quality (Western and prudent patterns; DASH, AMED, and AHEI-2010 scores, all in period-specific quintiles) and risk of early-onset adenoma overall and according to high-risk vs low-risk adenoma. The associations between each of the dietary indexes and early-onset adenoma were evaluated in different models in the entire study population. As secondary analyses, we further examined the associations by anatomical location, size, and histology. We evaluated the associations according to malignant potential in 2 logistic regressions using the same reference group: 1 for high-risk vs no adenoma, and the other for low-risk vs no adenoma, and similarly for comparisons according to size and histology. Joint association of Western and prudent dietary patterns with risk of early-onset high-risk adenoma was further tested to take into account the combination of 2 distinct dietary patterns. Because some of these early-onset adenomas will be first captured through average-risk screening if they have not had an endoscopy at younger ages,[35,36] we performed sensitivity analyses stratified by age of endoscopy (younger than 45 years vs 45 years and older). Also, we conducted a sensitivity analysis among only those who had a colonoscopy to address the likelihood of proximal adenoma not being detected if participants only had a sigmoidoscopy. To replace missing data for exposure in the subsequent cycles, we carried forward nonmissing dietary intake values from the prior questionnaire cycle. Missing data for covariates were treated similarly.

Similar to prior work,[27,37–39] we identified the case-control sets every 2 years among participants with a lower endoscopy during the same period. Once a participant was diagnosed with an adenoma, she was censored in all subsequent follow-up cycles.[27] To account for the possibility that an individual may have undergone multiple endoscopies over the study period and to handle time-varying exposure and covariates efficiently, we constructed a new record for each 2-year follow-up period during which a participant underwent a lower endoscopy, using Andersen-Gill data structure. Age-adjusted and multivariable logistic regressions for clustered data (PROC GENMOD) were used to account for repeated observations and estimate odds ratios (ORs) and 95% confidence intervals (CIs). Tests for trend were conducted using the median of each quintile of dietary patterns and scores as a continuous variable.

In age-adjusted models, we controlled for age, total caloric intake, time period of endoscopy, number of reported endoscopies, time in years since the most recent endoscopy, and reason for the current endoscopy. In multivariable models, we additionally adjusted for the following potential confounders: height,[40] body mass index,[41] history of CRC in a first-degree relative,[42] menopausal status,[43] menopausal hormone use,[44] personal history of type 2 diabetes,[45] pack-years of smoking,[46] physical activity in metabolic equivalent of task-hours,[47] current use of multivitamin,[48] and regular use (≥2 times per week) of aspirin[49] or nonsteroidal anti-inflammatory drugs.[50] For the DASH diet, we additionally adjusted for alcohol intake. All analyses were performed using SAS 9.4 (SAS Institute, Inc, Cary, NC). Two-sided P values less than .05 were considered statistically significant.