Prevalence in the United States of Selected Candidate Gene Variants: Third National Health and Nutrition Examination Survey, 1991-1994

Man-huei Chang; Mary Lou Lindegren; Mary A. Butler; Stephen J. Chanock; Nicole F. Dowling; Margaret Gallagher; Ramal Moonesinghe; Cynthia A. Moore; Renée M. Ned; Mary R. Reichler; Christopher L. Sanders; Robert Welch; Ajay Yesupriya; Muin J. Khoury; CDC/NCI NHANES III Genomics Working Group


Am J Epidemiol. 2009;169(1):54-66. 

In This Article

Materials and Methods

Survey Design

NHANES III is a complex, multistage sample survey conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (CDC),[35,43] during 1988-1994. This cross-sectional study was designed to provide national statistics on the health and nutritional status of the civilian, noninstitutionalized population in the United States aged 2 months or older. Certain populations, including young children, older adults, non-Hispanic blacks, and Mexican Americans, were oversampled.[35] As with standard NHANES analyses, race/ethnic groups were defined on the basis of the combination of the reported race (black, white, other) and reported ethnicity (not Hispanic, Mexican American, other Hispanic) of survey participants.[35] Detailed household interviews were conducted to obtain information on sociodemographic variables, medical history, health-related behaviors, and use of medications. As part of the survey, physical examinations and laboratory and radiologic measurements were performed in special mobile examination centers.[35]


During Phase 2 of NHANES III (1991-1994), 10,052 participants aged 12 years or older were examined in the mobile examination centers. As part of the examination consent, participants agreed that their blood could be kept for long-term storage and future research, although genetic research was not mentioned specifically. In August 2001, the CDC/NCHS Ethics Review Board approved a revised plan for use of these specimens according to guidelines in the August 1999 National Bioethics Advisory Commission report on the use of stored biologic materials for research. This revised plan allows linkage of the genetic laboratory results to NHANES data through the NCHS Research Data Center to ensure that confidentiality of the study participants' identities is maintained.[44] Attempts were made to establish Epstein-Barr virus-transformed cell lines[35,44] from white blood cells obtained from 8,200 of the Phase 2 participants. However, the final NHANES III DNA bank contains 7,159 participants because of the inability to transform and grow a successful immortalized cell line (n = 1,004), concerns regarding laboratory practice and quality assurance (n = 21), and exclusion of 16 individuals who were not genotyped. The bank is jointly maintained by both NCHS and the National Center for Environmental Health at CDC. Demographic characteristics of participants in the DNA bank are included in Table 1. Sixty-two percent of participants were from households with multiple family members (average, 1.59 members per household; range, 1-11). This prevalence study was approved by the NCHS Ethics Review Board.

Selection of Candidate Genes and Variants

Members from a multidisciplinary working group reviewed available phenotype data from NHANES III, performed systematic literature reviews, and identified candidate genes and physiologic pathways thought to be associated with diseases of public health significance at the time of project initiation. The selection of polymorphisms for this study was also based upon input from the SNP500Cancer resource,[45] which had already developed genotyping assays for numerous SNPs in the selected genes based on their potential importance to physiologic processes, epidemiologic studies, and health outcomes.

The selected variants are in genes that encode proteins in 6 major cellular and physiologic pathways: 1) nutrient metabolism (e.g., homocysteine, lipids, glucose, and alcohol); 2) immune and inflammatory responses; 3) xenobiotic metabolism (e.g., of drugs, carcinogens, or environmental contaminants); 4) DNA repair; 5) hemostasis and the renin-angiotensin-aldosterone system; and 6) oxidative stress. The variants are in pathways affecting the development of multiple diseases, including cardiovascular disease, diabetes, cancer, and infectious diseases, as well as modulation of the effects of environmental and occupational exposures.

Genotyping Methods

DNA analysis for the project was performed at two facilities because neither lab had methodology developed to analyze all of the genetic variants: 1) the Core Genotyping Facility, National Cancer Institute (NCI), National Institutes of Health, Bethesda, Maryland (, and 2) the Division of Laboratory Sciences, National Center for Environmental Health, CDC, Atlanta, Georgia. Each lab analyzed all DNA specimens for each subset of genotyping assays performed.

Most polymorphisms were assayed by either the TaqMan assay (5' nuclease assay; Applied Biosystems, Foster City, California) or the MGB Eclipse assay (3' hybridization-triggered fluorescence reaction; Nanogen (formerly Epoch Biosciences), Bothell, Washington). Two polymorphisms were genotyped by pyrosequencing, and one was by capillary fragment analysis. Water controls and DNA samples with known genotypes, purchased from Coriell Cell Repositories (Camden, New Jersey), were included on each 384-well plate. Detailed genotyping methods, including primer and probe sequences, are described in Web Appendix 1 and Web Table 1, respectively. (This information is described in Web-only material that includes 8 Web appendixes, 1 Web table, and 1 Web figure; each is preceded by "Web" in the text. All are posted on the Journal's website (

Quality Control

The NHANES III genotyping data were monitored by a quality assurance and quality control committee composed of experts in laboratory science at CDC and NCI. The group monitored results of NHANES III quality control genotyping to ensure that the data met quality control guidelines established by NCHS.

Initial quality assurance assessments determined that at least 7,128 specimens, depending on the laboratory, were suitable for genotyping analysis on the basis of sample quality. All polymorphisms with genotyping call rates below 95% completion did not meet quality control criteria and were removed from further analyses. NHANES provides 480 quality control specimens for all studies that use the NHANES III DNA bank samples. These include blind replicates of approximately 6% of the 7,159 samples, to determine the accuracy and reproducibility of the assays. Assays that passed the blind-replicate analyses ( > 98% concordance according to NCHS guidelines) were tested for deviation from Hardy-Weinberg proportions calculated separately for each race/ethnic group in a standard unweighted analysis.[46] The threshold for a genetic variant to pass Hardy-Weinberg analysis was P ≥ 0.01 (2 sided) for at least 2 of the 3 main race/ethnic groups (i.e., non-Hispanic white, non-Hispanic black, and Mexican American), with use of a chi-square goodness-of-fit test. The race/ethnicity category "other" was not used in determining the deviation from Hardy-Weinberg proportions because of the genetic heterogeneity of this group. Data from 192 samples were removed from certain assays because of a sample handling issue discovered in one of the laboratories. Genetic variants that met all quality control guidelines were used for further analyses. The range of successful genotype identifications for these variants was 97.5%-99.9% (median, 99.2%). Results from the tests of deviations from Hardy-Weinberg proportions for these variants are listed in Web Appendix 2.

Overall, 90 variants in 50 genes were available for estimation of allele frequency and genotype prevalence. Nearly all (n = 87) of the variants genotyped are SNPs, and 3 are insertion/deletions. Various diseases or conditions for which these genes have a confirmed or purported association are shown in Web Appendix 3. This list is not comprehensive, but it demonstrates that the genes studied are involved in major pathways that have a role in the etiology of several diseases or conditions with public health significance.

Statistical Analysis

Sample Weights. Because NHANES III is a multistage, complex sample survey, all statistical analyses must account for sample weights and the survey design to produce unbiased national estimates and appropriate standard errors. The variance in clustered data caused by households with multiple related study participants was accounted for by use of the appropriate sample weights and the survey design in SUDAAN software (SUDAAN Statistical Software Center, Research Triangle Park, North Carolina). Point estimates and variances were calculated by using sample weights recalculated[47] for the Genetic Component of NHANES III. These weights were derived from the appropriate NHANES III, Phase 2, mobile examination center (MEC) sample weights to adjust for participant refusal to consent to future research and from the inability to generate cell lines and obtain DNA as mentioned above. NHANES genetic weights are specifically estimated for the genetic component of the 7,159 DNA bank participants, and none of the other weights provided by NHANES is appropriate. More detailed information about statistical weights in NHANES III is available online (

Prevalence Estimation. Analyses were conducted by using SAS-callable SUDAAN, version 9.01, and SAS, version 9.1 (SAS Institute, Inc., Cary, North Carolina). Deviations from Hardy-Weinberg proportions were tested with a chi-square goodness-of-fit approach by using SAS/Genetics (SAS Institute, Inc.). Allele frequency and genotype prevalence were calculated in SUDAAN and weighted by using the NHANES III Genetic Component sample weights for each gene variant for all major race/ethnic groups (i.e., non-Hispanic white, non-Hispanic black, Mexican American, and other) (data for "other" are not shown), age groups, and sexes. Point estimates and 95% confidence intervals were calculated and weighted for each race/ethnic group in SUDAAN to obtain the nationally representative estimates for the US population. The Taylor series linearization approach,[48,49] which derives a linear approximation of variance estimates to develop corrected standard errors and confidence intervals, was implemented to estimate variances. Tests of the difference in allele frequencies among race/ethnic groups ("other" was excluded), age groups, and sexes were performed by using polytomous logistic regression. Tests of the differences in genotype prevalence among these groups were evaluated using the Wald chi-square method. Statistical significance was considered as P < 0.05. The differences in allele frequency and genotype prevalence by age and by sex were examined after adjustment for race/ethnicity by using the Cochrane-Mantel-Haenszel test at a significance level of 0.05.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: