Pam Harrison

October 17, 2011

October 17, 2011 (Montreal, Quebec) — An electronic medical record (EMR)-based algorithm, developed by 5 institutions with different EMR systems in the United States, has proven to be portable, productive, and cost effective. It also allows for significant expansion of genotype–phenotype studies conducted by members of the multicenter Electronic Medical Records and Genomics (eMERGE) Network.

Dana Crawford, PhD, assistant professor of molecular physiology and biophysics, Vanderbilt University, Nashville, Tennessee, and her eMERGE colleagues used phenotype data derived from EMRs from the 5 participating institutions to perform genomic-association studies. Results showed that not only was the algorithm, which was initiated by Vanderbilt University and validated by each subsequent institution, "useable" by all eMERGE members, but that EMR-linked genomic data could also be "reused" to carry out network-wide genome-wide association studies (GWAS) for diseases and traits, such as autoimmune hypothyroidism. Among the phenotypes explored by individual members of eMERGE were dementia, cataracts, peripheral arterial disease, type 2 diabetes, and cardiac conduction.

"In the United States, healthcare is not centralized; we have different clinical practices, different EMR systems, and different populations, so we were not sure that we could set up an algorithm at one site and then use it at all the other sites," Dr. Crawford told Medscape Medical News.

Dr. Crawford presented the study findings here at the 12th International Congress of Human Genetics and the 61st American Society of Human Genetics Annual Meeting.

eMERGE Network

The eMERGE Network was established in 2007, and represents a collaboration of 5 EMR-linked biobanks: Group Health Cooperative with the University of Washington, Seattle; Marshfield Clinic, Wisconsin; Mayo Clinic, Rochester, Minnesota; Northwestern University, Chicago, Illinois; and Vanderbilt University. "Each network has an extensive quality-control genotyping pipeline," Dr. Crawford explained, "and across the network, we have approximately 13,000 European American samples."

To develop the algorithm, researchers focused on primary hypothyroidism (PH) as their phenotype of interest. To identify cases of PH, "electronic selection logic," with 3 identifying criteria, was used: an International Classification of Disease (ICD)-9 billing code for hypothyroidism; laboratory values (abnormal thyroid-stimulating hormone [TSH] or T4 levels); and the presence of thyroid-replacement medication.

Control subjects had no ICD-9 diagnosis, had normal TSH levels, and were not receiving thyroid-replacement medication. With this algorithm, "we defined about 1300 cases of PH and approximately 5000 controls from across the network," Dr. Crawford explained. To see how well the algorithm selected for cases and controls, researchers determined the positive predictive value (PPV) of the algorithm. They found that the average weighted PPV was 92% for cases and 98% for controls.

Collective GWAS of the 1300 cases and 5000 control subjects found an association between PH and common variants on chromosome 9 near the FOXE1 gene that was of genome-wide significance. Using an independent dataset of European Americans (263 cases of PH and 1616 control subjects), researchers found that the same locus was statistically significant. As Dr. Crawford noted, FOXE1 is highly expressed in the thyroid and pituitary glands.

"FOXE1 encodes for thyroid transcription factor, and likely plays a crucial role in thyroid morphogenesis," she added. "We know mutations in this gene are associated with congenital hypothyroidism and cleft palate. Variants around FOXE1 have also recently been associated with thyroid cancer."

"Our hope is that these findings will point to the actual etiology or biology of PH," Dr. Crawford said. "Now that we have these large datasets, we can get together and make it happen faster." Rex Chisholm, PhD, vice dean for scientific affairs at Northwestern University, told Medscape Medical News that the eMERGE Network benefited from the collaborative process, in which each institution in turn tested the algorithm, verified whether it was working (with a manual chart review), and then tinkered with it to maximize its PPV.

"One site might not have had adequate statistical power to identify genetic variants associated with a specific condition, but once you go across the network, you increase your power dramatically," he observed. The National Human Genome Research Institute has now sanctioned a phase 2 continuation of eMERGE Network research, and has provided funding to include 2 more institutions.

"With these 7 sites, we now have 50,000 individuals with EMR data, and we are discussing which phenotypes we might focus on," Dr. Chisholm said, adding that the phase 2 research will encompass some 40 phenotypes, compared with the 14 phenotypes evaluated in phase 1.

12th International Congress of Human Genetics (ICHG) and the 61st American Society of Human Genetics (ASHG) Annual Meeting: Abstract 95. Presented October 13, 2011.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.