Big Data: Can Medicine Learn From Amazon and Netflix?

; Euan A. Ashley, MRCP, DPhil


April 20, 2015

This feature requires the newest version of Flash. You can download it here.

Robert A. Harrington, MD: Hi. This is Bob Harrington on and Medscape Cardiology.

We're going to tackle a topic today that has gotten a lot of interest in both the medical literature as well as the lay press over the course of the past few years. That's the topic of big data. It's become a phrase that's perhaps used more than it's understood, and in cardiovascular medicine, I think that's especially true. People are trying to figure it out. Everyone's talking about big data, but what actually is it? How is it going to help inform what we do in science? How is it going to help us inform what we do in clinical practice?

It deals with the subject of genomics. It deals with the subject of the other "omics"—metabolomics, proteomics, et cetera. It also deals with issues of trying to define a population. As we move from a medical care delivery system of fee-for-service to a medical care delivery system, largely thinking about population health, our ability to utilize data to inform decision-making is going to become increasingly critical.

Today, I am really pleased to have a friend and colleague from Stanford join me here for this conversation. Dr Euan Ashley is a cardiologist at Stanford. He's an associate professor of medicine in a division of cardiovascular medicine. For the purpose of this discussion, maybe most important, he's the director of the Stanford big data and biomedicine initiative. Euan, thanks for joining us here today on Medscape Cardiology.

Euan A. Ashley, MRCP, DPhil: Thanks for having me, Bob. It's really a pleasure to be here.

Defining Big Data

Dr Harrington: Euan, as I said in my intro remarks, people talk about big data, but we get a lot of different views about what it actually means. You are the director, as I said, of our local big data and biomedicine initiative. Define for our listening audience what you think about in the context of cardiovascular medicine, health, science—what you're thinking about when you think about the phrase "big data."

Dr Ashley: I think that's the most common question that we get asked and there are almost as many definitions of big data as there are data points in some of these data sets.

There are a few that I like. One is, big data is any data that's bigger than what you're used to dealing with. Some people say big data is anything that doesn't fit in an Excel spreadsheet.

Probably the most common definition out there is one that is called the three Vs: that big data is defined by a large amount of volume, a large velocity of data coming at you fast, and a large variety of data. That's an attractive definition, because it starts to get at the issue.

Most people have a sense of what big data is from the technology world. They deal with it every day. For example, if you log onto Amazon and you start to put something in your cart, Amazon starts to suggest a bunch of other things that you might want to buy because other people have bought those things after purchasing the item in your cart. What's going on behind the scenes is big data. It's basically taking large amounts of, in this case, clicks—large amounts of data about individual people and what they purchased—and making predictions about what they might want to purchase in the future. Netflix does the same in recommending movies that you might like to watch.

One of the best examples is the fraud detection on our credit cards. Recently, I was trying to buy a gift for my brother who lives back home in Scotland. I went online, put my credit card in, and ordered it for him in Scotland. Almost immediately, I got a text message from Chase, my credit card provider, saying, 'Hey, did you mean to do that? This is not your normal pattern of purchasing.' The technology that goes behind that is the most intuitive definition of big data — it's taking large amounts of data and allowing the computer to learn from it.

But you also have to think about cardiovascular medicine. I think that's very important. Where can we use data there? I think that we could talk a bit about the variety of ways into the use of big data within cardiovascular medicine.

Machine Learning

Dr Harrington: We're going to come back to talk about both cardiovascular science and the practice of clinical medicine. But I first want to pause for a minute and reflect on what you said. Many of us use Netflix and Amazon weekly, if not every day. And I love your description of machine learning. It's through the accumulation of this data — it's not just static, it's not sitting there like a structured database, where the only way to retrieve the information is for you to ask a question in a certain way after the database was organized, such that data could be then extracted and provide you with an answer. That's much different. The fact that you're ordering a movie (let's say it's a mystery), then next time you log into Netflix, it says, 'You might also like...'—that's pretty sophisticated, from a computing perspective.

Dr Ashley: I think it's really a game changer in many ways. In some ways, it is not so different from what's been done before. Our statistical colleagues have been thinking about predictive analytics for a while. But I think you put your finger right on it. The difference with the new approach, and where machine learning really stands out, first of all is that it is a bit more dependent on having these very large data sets. But the computer is actually learning the patterns in a way that isn't prescribed up front. I think that the key there, and the really interesting thing, is that it can pick out patterns that we wouldn't otherwise be able to guess. Being surprised by a pattern is something that is new and relatively unique to a big data world.

Dr Harrington: I agree with you. I always caution—particularly young people who are starting to do research using large amounts of information—that you've got to be agnostic to what the answers are going to be here, and agnostic to what the potential outputs are. We all bring our biases to the table. But in this new world, with huge amounts of information, you've got to be a bit more agnostic about what you expect and be willing to say, that's interesting. What can I learn from that? What else might I ask?

Dr Ashley: It contrasts with a very traditional approach that we see commonly in basic science, where you're almost forced to bring a hypothesis to the table and say, I think it really has to be in this direction. Or I really think it's this molecule, so I'm going to test this one molecule. And now, I'm going to move on to the other one.

We have this analogy now where you can, as you say, come to the question without bias and ask what pattern you see within the data. Later, of course, you can then test that hypothesis and make sure the relationship you find is causal. But in order to discover it, you can use the power of the computer and the power that goes beyond any pattern, that you as a human could pick up, to find patterns that otherwise would not be discovered.

Dr Harrington: I'm glad you said that we are not advocating the discarding of hypothesis-based science, but rather, we're saying that this is another way of doing science that is complementary to hypothesis-based science. Is that a fair statement?

Dr Ashley: Exactly. I look at it this way: You only have a certain number of years of your life. We don't have enough time on this earth to test things. If you can be smarter about choosing the things that you test, then that can only be good. I think big data can really help you be smart about stacking the dice in your favor when it comes to doing that experiment.

Born a Geek, but Trained as A Cardiologist

Dr Harrington: Before we start talking about cardiovascular science and cardiovascular practice, let's reflect for a couple a minutes, Euan, on you and your role. You're a cardiologist. You specialize in taking care of patients with heart failure. How did you get into this world of big data? How did you come to play the role at Stanford of overseeing the big data and biomedicine initiative?

Dr Ashley: Great question. I often say that I was born a geek but trained as a cardiologist. As a kid growing up, I was into computers probably more than I should have been. I liked to go outside and play sport too, but I was often inside programming computers. I wrote a tax program for my dad, believe it or not, when I was about 14, so I was an über-geek growing up.

Dr Harrington: Good kid to have around the house, that's for sure.

Dr Ashley: The challenge was, they changed the tax code the next year, so my whole computer program became defunct the next year.

Dr Harrington: Not a good business model.

Dr Ashley: No; I learned early about upgrade cycles. I've always been interested in data and data science. Before there was ever the phrase "big data," I just loved computers and gadgets. That is partly what drove me toward cardiology. As cardiologists, we're partly technologists. We love the technology, and we love to see what that can do for our patients.

Working on the Human Genome

Dr Ashley: I did my PhD at the University of Oxford, and when I learned genetics, at that point it was still pretty low-throughput. We were doing polymerase chain reaction (PCR) of amplifying a single gene.

But it was changing. It just began to change, and it changed in large part because of technology, much of which came from around where we are here in Silicon Valley, and biotechnology companies. The microarray was the first one: Why just measure 1, 2, or 10 gene expression measurements at once? If you could, wouldn't it be better to measure 400 or 500? So the microarray was born.

From that, a whole world of high-throughput science was created—starting, of course, with gene expression, but moving rapidly toward gene chips. lllumina was one of the companies that pushed the envelope in terms of gene chips, along with Affymetrix. Later, they took on the world of gene sequencing, and we went from a world where the Human Genome Project cost $3 billion at the beginning, probably 500 million of which was spent actually sequencing one genome, to today, when you can sequence a genome for $1000.

I think the world was irrevocably changing around me. I realized that having an interest in computers, statistics, and data science was actually something that was going to be very relevant for medicine, whereas it had originally started as a hobby. It became a very natural fit. I'd gotten very involved in genomics, sequencing, and building algorithms for interpreting the human genome. I think that that grabbed the attention of our dean. He called me up one day and said, I need you to do something for me.

Dr Harrington: You're also being a little modest. You led the team that really put the clinical interpretation around a single human genome. You want to talk a little bit about that, how that came to be? That was a seminal piece of work that now we look back at as maybe changing an era.

Dr Ashley: You're very kind. That was really a big team effort. It came out, in fact, of a conversation I had with one of our chairs of bioengineering at the time, Steve Quake. He was the fifth person in the world to have sequenced his genome. Back then, there really were only five or 10 people whose genomes had ever been sequenced, and he was the fifth.

He was showing me some of the data one day, and he was pointing out some gene variance. Particularly, he showed one that was familiar to me as a physician who looks after patients with hypertrophic cardiomyopathy. It was a variant in myosin-binding protein C, which is a gene that is commonly the cause of hypertrophic cardiomyopathy. He said, "Hey, what do you think of that gene?" I said, "Well, I know that gene quite well." So I started asking cardiovascular family history. As usual, he couldn't think of anything at first, but then, before I knew it, he was telling me about his dad's ventricular tachycardia (VT) and myocardial infarction (MIs) in his family, and a sudden death in his cousin's son.

On the basis of his family history alone, I invited him to be screened in our clinic. He said, "Oh, you know, my family has been telling me to do that." As a 40-year old man, he'd never really seen a doctor or even had a lipid panel. He was suddenly walking into our clinic here at Stanford as probably the first patient ever to have a genome "in his back pocket," which was a remarkable thing but also a little bit scary. Here we have a patient with 6 billion data points that are potentially relevant to his health, and he's asking his doctor, is that useful for his care?

I think that was really the inspiration to pull together this team of Stanford and some Harvard computational biologists and physicians to try and work out what would we actually do? The price of genome sequencing is dropping. It became clear that one day, everybody was probably going to have their genome in their medical record, and that we really needed to start thinking about how we were going to deal with that.

Over the course of 6-12 months, we downed tools and all of these groups worked on this one thing. How do we do that? How do we harness the genetic information that's out there to bring it to bear on a single individual's medical care? We put it out there as the beginning of a chapter that is continuing and is really an exciting transformation in the way we do medicine.

The Cost of Dr Quake's Genome Analysis

Dr Harrington: That's a great transition point that I want to get to about how you see this actually affecting how we do clinical medicine. As you say, to look at Dr Quake's genomic sequence, you had to put a team together, and that team worked on this over 6-12 months.

What do you think the spend of that was, Euan, in terms of financial terms? How much do you think it cost you as a team to do that? Enormous amount of computational expertise, and an enormous amount of analytical power devoted to one person—what do you think it cost?

Dr Ashley: At the time, people talked about how we might be able one day to get to the $1000 genome, but we're going to be talking about the million-dollar analysis. So that became a catchword. It didn't cost a million dollars, even that first one. We calculated that it was probably 600,000 or 700,000 person hours, which is a lot.

The good news is, first of all, that was a few years ago now. We need a lot less manual labor. There still is a manual labor part involved because of the challenging and multivariate nature of evidence that are variant causes of disease. That always will require some degree of human input. But a large number of those algorithms that we were building for the first time have really become established and evolved. Many groups around the world have improved them.

Integrating the Genome in the EHR

Dr Harrington: Give us an example of one of these that might resonate with our listening audience, who are largely clinicians and who are interested in how this kind of work might affect what they're going to do—if not in their clinic today, then maybe in the near future. What kind of algorithms have been created that the clinician would resonate with?

Dr Ashley: In one of the thoughts that we had at the time, we were sort of imagining what this would look like in the future. The first part of that is, you shouldn't have to click a button, order a test, and then wait a few weeks for the results to come back, then go through a PDF document of the result that's mostly text. That's what happens today with genetic testing. What we were really envisioning was a situation where the genome was already there, not only present, but integrated into the medical records.

I think a good analogy is the one that we currently have with drug interactions. You have a patient in your clinic, and you're going to decide that they need a new antihypertensive, let's say. You make your choice, you pull up their prescription order form in your electronic health record, and you type it in. But what happens then in the background is that the pharmacy module of the electronic health record is going to check that drug with the drugs that the patient is already on. That's pretty helpful. If there's a well-known interaction, then it's going to stick that on the screen for you, just to remind you, and make you click 14 times that you really meant to prescribe that.

Our hope is that in this new world that we're imagining, the algorithms will work behind the scene, and we'll look up not just the rest of the drugs the patient's on but actually look up the patient's genome. They'll say wait a minute, there's an interaction here. Or this particular patient is known to metabolize clopidogrel for example (or the prodrug for that) faster. You may want to consider a higher dose or a different agent.

Dr Harrington: And there would be this info if, for example, you're starting people on oral vitamin K antagonism—might there be a way that you would tailor the dosing? There have been clinical trials on that[1]; whether or not that actually improves dosing is yet to be shown. But the principle is that this information would work behind the scenes to help the doctor, not necessarily to frighten the doctor into thinking, "Oh my God, this is going to take months to do."

Dr Ashley: Right, and a key point you made there, Bob is that when we ask physicians, are you interested overall in how genetics might help you prescribe, for example, almost everyone says yes. If you then ask them, do you want to order a test that takes 2 weeks to come back and we'll give you the answer, they generally say no. But if that information was already available and as you prescribed, it popped up on the screen in front of you, would you use it? Again, the answer is universally yes. We're used to doing that as physicians, aren't we—just pulling in whatever information we can from wherever we can get it, to try to make the best individualized decision for the patient in front of us.

Dr Harrington: I'm always saying that as clinicians, we are natural Bayesians. We love to take all this disparate information and assess the probability of something happening. It's always interested me. Although we're natural Bayesians, much of the evidence that we have in clinical medicine is based on frequentist statistics. What you're describing, in many ways, should be highly appealing to clinicians because it appeals to the Bayesian nature of the way they think.

Dr Ashley: I think so, and we're also very used to dealing with uncertainty: There is no black-and-white answer to this, but here's a probability that this could be helpful. We integrate that information routinely as part of our everyday practice. I agree, absolutely.

The Training Paradigm

Dr Harrington: Now let's talk a little bit about the training paradigm, because the world is changing. The way that you and I went through medical school, what we learned, and how we thought about applying that to our clinical practice, is now much different. Our medical students now are going to have to be much more grounded in the quantitative sciences and truly understanding issues of uncertainty and probability, and drawing inference, than ever before. In your thinking about the big data biomedicine initiative, what are you thinking about from the education and training perspective?

Dr Ashley: That's a really important point. I think the world is not the same today as it was even 5 years ago. The rapidity with which it is changing is on a different scale from the rapidity of which we're used to changing medical curricula, for example. For the training component, we have to address the needs of both the people who are currently in training—those who are just coming down the pike—and those of us who have trained or done our early training some time ago. It's a central component of what we're thinking about around the biomedical data science initiative at Stanford, as you mentioned.

We're very lucky here because there's a significant tradition of computer science at Stanford and a very strong bioinformatics program. It's a heavily oversubscribed training program that's been going for several years. We are very lucky to have a number of students around who are available to help teach and are available to go out into the world and spread the word. But I think we really do need to address this.

The National Institutes of Health (NIH) has done this recently. The NIH has a big data to knowledge initiative called BD2K and has started a number of training grants that are just out right now being reviewed in order to expand the base of people that are trained in this area. But I think there also needs to be change in the medical school curriculum so that our medical students are able to understand and deal with quantitative sciences in a way that they really never had to in the past.

An Initiative in Precision Medicine

Dr Harrington: Mentioning the NIH reminds me of the topic that I wanted to end with. When I think of the NIH, I think of the public investment into science, the public investment into pushing the boundaries of clinical medicine. It brings to mind President Obama's State of the Union Address this year, where he put forward an initiative in precision medicine.

Number one, Euan, were you excited about that? Number two, did it surprise you? Number three, what is this and do you think it really can make a difference?

Dr Ashley: All really good questions. It definitely was exciting. I think for anything, however defined, that's in the field of data and personalized, individualized medicine, to be given that kind of platform for discussion is very exciting. Those of us who work in this general area were certainly very happy to hear it given a platform.

It's interesting that he used the term "precision medicine" rather than the terms that we're maybe more used to, such as "personalized medicine" or "individualized medicine." That comes out of our desire to emphasize the breadth of the opportunity here. We described earlier in the conversation some examples of personalized medicine where we could use somebody's genome to try to individualize their care. Precision medicine has another aspect to it, which is that if we can use those kinds of tools—they could be big data tools, they could be omics tools, they could be standard clinical analyses — if we can use those tools and those kinds of analyses to understand diseases better, then we can be much more precise in our development of drugs.

I can give you an example of that, the one that was referred to both in the State of the Union address and the follow-up press conferences: cystic fibrosis. We've understood the genetic basis of cystic fibrosis for many years. It was Francis Collins, the director of the NIH himself, who was part of the team that originally found the gene for that condition—the CFTR gene.

But in the recent past, drug companies have begun to realize that we can define subpopulations with this disease that can be targeted very precisely with medications. In this case, there was a drug that was fast-tracked through the US Food and Drug Administration (FDA) last year called Kalydeco®, which is aimed at a subset of cystic fibrosis. It depended on having the genetic information about the disease present. As the listener will probably know, cystic fibrosis is a disease of the CFTR channel that is either dysfunctional within the cell or dysfunctional on the membrane. So there's a particular subsection of the disease that is targeted by this drug. It was fast-tracked through and has been a huge success in the market.

That's part of the inspiration for the precision medicine initiative. It is that sort of understanding.

There are examples within cardiovascular medicine, too. A company called MyoKardia is starting to think about targeting inherited heart disease in a much more specific way. It's very exciting. It is a new way of doing medicine and science. We're very happy to have it given a national platform.

Dr Harrington: Euan, this has been a fabulous discussion on big data and how it might be useful both in science as well as in clinical medicine. I want to thank you for joining us here today on Medscape Cardiology.

Again, our guest today has been Dr Euan Ashley, who is a cardiologist and associate professor of medicine, and the director of the Stanford big data and biomedicine initiative. Euan, thank you so much for joining us here today.

Dr Ashley: Thanks for having me. It's really been a pleasure.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.