Data Gathering Transforming Medical Research

Ingrid Hein

September 30, 2016

Extracting information from mass databases is the new way of gathering medical intelligence and doing research, according to a group of data scientists from the private and public sector.

"Precision medicine is not a single program, it is a movement. It's something we all are doing to enter the age of genomic medicine," said DJ Patil, PhD, chief data scientist and deputy chief technology officer for data policy for the White House.

The Precision Medicine Initiative, a $215 million federally funded project to create a repository for medical data on 1 million Americans over the next 4 years, "is just one example of making this movement real," he told the audience at the Health 2.0 Annual Fall Conference 2016 in Santa Clara, California.

Enthusiasm for data collection and analysis is on the rise. Greg Orr, senior director of digital health at Walgreens, explained that his company is involved in the recruitment of customers for the Participant Technologies Center. The center will be responsible for the creation of mobile apps related to enrollment, consent, and data collection for the Precision Medicine Initiative Cohort Program.

We serve just about every neighborhood.

About 6 million customers are served every day by the 8100 Walgreens stores across the nation. "We serve just about every neighborhood," said Orr. "What we offer is being able to reach out to these folks."

Along with the drugstore chain, four other institutions — the Scripps Research Institute, Vibrent Health, PatientsLikeMe, and Sage Bionetworks — have been awarded funding to help recruit participants.

Walgreens has its own connectivity platform, with plug-in devices and apps, that is currently being used by about 1 million customers, Orr reported. Much of the data are "out in the wild," he pointed out, but they could be leveraged in the future.

Collecting data for the cohort will not be easy, unless there is a payoff, said Claudia Williams, MS, senior advisor for health innovation and technology at the White House Office of Science and Technology Policy.

According to a National Institutes of Health survey (PLoS One. 2016;11:e0160461), people will want access to their personal data in return for their participation, Williams reported. "The initial motivation is big, but friction is also big," she said. "If we reduce friction and give something back, we're on solid footing."

Although electronic health records are already being used to share data for research purposes, they don't come neatly organized with other data stores, which often contain more specialized and precise information.

Getting 'Down and Dirty With the Data'

"It's so important to get down and dirty with the data and start to organize it so we can be ready to show that pace of change," said Amy Abernethy, MD, chief medical officer and chief scientific officer at Flatiron Health. The start-up has partnered with the US Food and Drug Administration to determine how real-world evidence — derived from de-identified, HIPAA-compliant patient data captured outside of clinical trials — can provide insights into the safety and effectiveness of emerging anticancer therapies, such as immunotherapeutic agents.

To date, Flatiron's core analytics database is tracking almost 1.5 million cancer patients who have been treated at more than 250 community practices and academic medical centers since 2011.

"That's about one in five cancer patients," said Dr Abernethy.

"In particular, we focus on organizing unstructured documents," she explained. "These include medical case records, radiology reports, pathology reports, and the condolence card — the best indication that a patient has died."

Flatiron pulls electronic health records from more than 200 clinics into its database and processes them nightly, Dr Abernethy explained. The company then embeds third-party data streams — such as mortality, genomics, and claims data — to provide a more complete picture to researchers.

"Histology found in the pathology report is embedded in the right places in the dataset using a human review," she added.

To demonstrate how quickly the database can extract information, Dr Abernethy pulled out a dataset of cancer patients with advanced non-small-cell lung cancer in the United Sates.

"About a quarter of cancer patients with non-small-cell lung cancer receiving care in the United States have squamous cell histology," she reported. "This is the kind of thing you cannot get right in our usual datasets, but now we're able to see it in real time." A traditional clinical trial can take several years to extract this kind of data, she pointed out.

Human Longevity, another company steeped in data, is working to sequence the whole human genome.

The Human Genome

"We have sequenced our first 20,000 genomes, and identified about 500 million variants," said Brad Perkins, MD, chief medical officer of the company. But we don't know anything about 99% of those variants, he added.

Still, the data are very useful for things we do know. "Genes are amazing. They can answer a lot of questions a doctor may need answers to," explained Victor Lavrenko, PhD, technical lead at Human Longevity.

A doctor can enter keywords into a search engine to retrieve a patient's genetic mutations from the database. Dr Lavrenko demonstrated this by pulling up data on a patient who had about 4 million mutations.

"Most of us have 4 million," he explained. The database showed that the patient had lipase deficiencies. "The patient came in with atherosclerosis; he had clots. Lipase is the stuff that dissolves the clots," he said.

In this case, a doctor might want to prescribe warfarin, said Dr Lavrenko. But "if you type in that drug, you will get a list of mutations that are related to the drug. In this case, the patient has warfarin sensitivity."

The company currently has 25 petabytes of data stored on Amazon, and it will soon have a whole lot more.

Ultimately, the genome sequencing will be most useful when combined with "deep quantitative integration with other phenotype data," said Dr Perkins. This will drive "individualized risk assessments that, in turn, will drive prevention and protection."

This is all very new, and must be always put in perspective, Williams cautioned. "What has changed in the last decade is that, finally, we have the promise of digital data. We're really just starting to bring the data together."

It is important that the data work for us, not against us, she added. "We always have to remember that we are here collectively to do this for a person, a loved one. We should think of a technology as neither radical nor revolutionary unless it benefits us. We have to remember who we are building this for, and always focus on the individual."

The federally funded Precision Medicine Initiative awarded Walgreens, a partner of WebMD, and the Scripps Translational Science Institute funding to recruit study participants. Medscape is part of the WebMD Health Professional Network, and Dr Eric Topol, director of Scripps, is editor-in-chief of Medscape.

Health 2.0 Annual Fall Conference 2016. Presented September 27, 2016.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.