Genetically Predicted Levels of DNA Methylation Biomarkers and Breast Cancer Risk

Data From 228,951 Women of European Descent

Yaohua Yang; Lang Wu; Xiao-Ou Shu; Qiuyin Cai; Xiang Shu; Bingshan Li; Xingyi Guo; Fei Ye; Kyriaki Michailidou; Manjeet K. Bolla; Qin Wang; Joe Dennis; Irene L. Andrulis; Hermann Brenner; Georgia Chenevix-Trench; Daniele Campa; Jose E. Castelao; Manuela Gago-Dominguez; Thilo Dörk; Antoinette Hollestelle; Artitaya Lophatananon; Kenneth Muir; Susan L. Neuhausen; Håkan Olsson; Dale P. Sandler; Jacques Simard; Peter Kraft; Paul D. P. Pharoah; Douglas F. Easton; Wei Zheng; Jirong Long


J Natl Cancer Inst. 2020;112(3):295-304. 

In This Article

Abstract and Introduction


Background: DNA methylation plays a critical role in breast cancer development. Previous studies have identified DNA methylation marks in white blood cells as promising biomarkers for breast cancer. However, these studies were limited by low statistical power and potential biases. Using a new methodology, we investigated DNA methylation marks for their associations with breast cancer risk.

Methods: Statistical models were built to predict levels of DNA methylation marks using genetic data and DNA methylation data from HumanMethylation450 BeadChip from the Framingham Heart Study (n = 1595). The prediction models were validated using data from the Women's Health Initiative (n = 883). We applied these models to genomewide association study (GWAS) data of 122 977 breast cancer patients and 105 974 controls to evaluate if the genetically predicted DNA methylation levels at CpG sites (CpGs) are associated with breast cancer risk. All statistical tests were two-sided.

Results: Of the 62 938 CpG sites CpGs investigated, statistically significant associations with breast cancer risk were observed for 450 CpGs at a Bonferroni-corrected threshold of P less than 7.94 × 10−7, including 45 CpGs residing in 18 genomic regions, that have not previously been associated with breast cancer risk. Of the remaining 405 CpGs located within 500 kilobase flaking regions of 70 GWAS-identified breast cancer risk variants, the associations for 11 CpGs were independent of GWAS-identified variants. Integrative analyses of genetic, DNA methylation, and gene expression data found that 38 CpGs may affect breast cancer risk through regulating expression of 21 genes.

Conclusion: Our new methodology can identify novel DNA methylation biomarkers for breast cancer risk and can be applied to other diseases.


Breast cancer is the most common cancer for women in the United States as well as many countries around the world.[1] DNA methylation plays critical roles in cancer development, including breast cancer.[2]

DNA methylation of several genes in white blood cells had been associated with breast cancer risk; however, inconsistent results were found.[3–7] Most of these studies had a retrospective design. Two prospective studies found that overall DNA hypomethylation in white blood cells was associated with increased breast cancer risk.[8,9] In addition, a panel of 250 CpGs sites (CpGs) in white blood cell DNA was identified to be predictive of breast cancer risk.[10] However, none of these CpGs were consistently observed in a later study.[9] These studies were limited by small sample size, lack of replication, and/or reverse causation. Furthermore, the repeatability of DNA methylation measurements at some CpGs using the HumanMethylation450 BeadChip (Illumina, San Diego, CA) was not optimal,[11] which may have contributed to the inconsistency across studies.

A recent study indicated the epigenetic supersimilarity of monozygotic twin pairs.[12] More recently, 24 heritable CpGs were associated with breast cancer risk.[13] Multiple genetic variants had been identified as DNA methylation quantitative trait loci (meQTL),[14–16] suggesting that DNA methylation at some CpGs are genetically determined and thus can be predicted using genetic variants. Studies using cis (500 kilobase [kb] flanking regions) meQTL single-nucleotide polymorphisms (SNPs) had discovered novel CpGs for diseases.[17,18] However, the proportion of variance explained by a single-meQTL SNP for most CpGs is typically small. Herein, we propose a new methodology to build statistical models to predict DNA methylation in white blood cells via multiple SNPs in a reference dataset and then apply the models to large genomewide association study (GWAS) datasets to evaluate genetically predicted DNA methylation in association with disease risk. We tested this methodology by investigating the association of genetically predicted DNA methylation with breast cancer risk using data from 122 977 breast cancer patients and 105 974 controls.