For Better Clinical Research, Collaborate With Statisticians

The Early Days of the Duke Databank

; Frank E. Harrell, PhD


August 16, 2018

Robert A. Harrington, MD: Hello. This is Bob Harrington, from Stanford University.

Frank E. Harrell, PhD

I'm really privileged to have joining me as a guest my long-time friend and colleague Dr Frank Harrell. Frank is professor of biostatistics at Vanderbilt University School of Medicine. He's also an expert statistical advisor to the US Food and Drug Administration Center for Drug Evaluation and Research and their biostatistics group.

Frank E. Harrell, PhD: Absolute pleasure to be here, Bob.

It Started With Mom

Harrington: Frank, you and I have known each other and have worked together many times over the years. For our listeners, why don't you give a little of your educational background and explain how you ended up coming out of your PhD training at Duke University? Maybe we'll get into a little bit of a history of the Duke Databank at that point.

Harrell: I had a really interesting background, at least to me. One summer, when I was a bored 15- or 16-year-old, my mother said, "Did you know that they've got volunteer opportunities at the VA hospital in Birmingham?" I took that opportunity and started working with physicians for the first time. This was in gastroenterology, and I started to see the physiologic data they were collecting for motility studies.

I crossed over to working in cardiology as I became an undergraduate student at the University of Alabama, Birmingham (UAB). I worked with a lot of physiologic data and also helped automate the pressure waveform analysis in their catheterization lab. Even in those early days, the close relationships with clinician investigators was something that really thrilled me. It was very challenging and fun, and I also realized that I liked the math part of it and didn't like the long hours of the clinicians I was working with. That helped push me.

Finding Biostatistics From 'Unbelievable Advice'

Harrell: Then I had unbelievable advice from the head of biostatistics at UAB, David Hurst, who was a visionary. He said, "You need to go into biostatistics. Go to the University of North Carolina (UNC), and your supporting program needs to be biomedical engineering and computer science." This was in 1973, and I did exactly what he said, except my supporting program was biomedical engineering and physiology. That was the best advice I've ever gotten—it was just unbelievable.

Harrington: When you think about it, that was visionary thinking and years ahead of where the field was going.

Harrell: It was just incredible. Dave Hurst was somebody who was a data guru. He knew more things to do with data and putting multiple tools together that no one else thought of putting together. He was a real inspiration for me. He helped me get aligned in a career where he said, "Not only will you have fun and have great challenges, but the job opportunities are unlimited." And that has always been the case. When I was a graduate student, I worked on probably one of the most expensive clinical trials the National Institutes of Health (NIH) ever undertook at UNC Chapel Hill—the Lipids Research Clinics program studying cholestyramine for reduction of cardiovascular death and myocardial infarction. We were the data coordinating center.[1]

I learned something about clinical trials, data, data management, and research operations of clinical trials. Kerry Lee, my long-time colleague at Duke University whom you know very well, was on my dissertation committee at UNC Biostatistics, and he recruited me to Duke. Joining the Duke Databank for Cardiovascular Disease was also one of the best things I ever did. That was in 1978. Bob Rosati and Phil Harris were there. Shortly thereafter, it was David Pryor, Mark Hlatky, and Rob Califf, and then you came along a little bit later.

Duke Databank

Harrington: It was a pretty exciting time. Let's pause for a minute and let the audience know the concept of the Databank. As I understand the history, Eugene Stead, who had been the chair of medicine in the '60s and '70s at Duke, had this notion that the human brain could not aggregate and analyze all of the data that were coming in through the clinical practice setting. He thought, "Let's use this new device, the computer, to collect, aggregate, and then help analyze the data." In some ways, it was that simple. But in some ways, it was so far ahead of what the rest of the world was doing at that time.

The respect that he [Stead] showed for people who were not MDs was very obvious, and that fostered the great start we had.

Harrell: Gene Stead was so far ahead, it's hard to even fathom it. He believed in data, and he got us to formulate something called the "prognostigram," where every patient coming in with chest pain getting a cardiac catheterization got a customized survival curve estimate based on the current follow-up in the Duke Databank. It was customized to their characteristics, especially their coronary obstructive disease and other risk factors, and it compared the survival curve for those getting coronary artery bypass graft (CABG) with those getting medical therapy. I think Bob Rosati probably called it a prognostigram, but Gene Stead called it a "second opinion," and he got insurance companies to pay for this as a second opinion, I believe.

Harrington: I didn't know that last part about the insurance companies. That sounds very Stead-like though, doesn't it?

Harrell: Yes.

Harrington: I'm sitting in my office now at Stanford, and on my wall I have a picture of the variables in the multivariate analysis taken from the first article about the prognostigram. They gave me one of the pictures when I left the Databank years ago. Next to that is a picture of the 25th anniversary of the Databank faculty with you and I in the picture together. It's kind of fun to hear you talk about the prognostigram. It was a fundamental observation, was it not, that data could more effectively drive clinical decision-making than anecdote?

The Duke Databank Faculty 25th anniversary. Circled: Robert A. Harrington (back) and Frank E. Harrell (front). Image courtesy of Robert A. Harrington, MD

Harrell: It certainly was. Gene Stead had another attribute. He had had a long history of working with UNC Biostatistics, and he thought biostatistics were important to team players in the new collaborative environment. He fostered that and helped recruit that way, and kept a bridge with UNC Chapel Hill. The respect that he showed for people who were not MDs was very obvious, and that fostered the great start we had at the Duke Databank, which led into the way the Duke Clinical Research Institute was created.

Harrington: Let's talk about the early Databank days. You are hitting upon some key lessons that have real relevance today. One of the key lessons of the Databank is that clinical research is a team sport, and you need people coming from very different backgrounds. What always struck me about the Databank and the conferences we would have was that there were statisticians who understood the clinical issues and clinicians who understood the quantitative issues. Could you talk about that and tell us about some of those early days of the Databank conferences, which as a fellow in the Databank were absolutely frightening ?

The statisticians got a kick out of challenging the clinicians, and clinicians gave it right back to the statisticians.

Early Days of Databank Conferences

Harrell: Yes, we tried to make them frightening in a fun sort of way. The statisticians got a kick out of challenging the clinicians, and clinicians gave it right back to the statisticians. Some of that was intentional, but I think it was always in the most fun, intellectual spirit. Before we had those formal conferences, we had informal conferences in Room 2000 of the old Duke Hospital, which was a big computer room. We had this big, old mainframe computer there.

We would bring some chairs out to the middle of this big room for the cardiologist and the statisticians. At that point, it was Kerry Lee and me. We would have these almost staged arguments. They were intended to be loud and challenging, because everyone knew that what motivated everyone else was seeking a better approach to a certain research problem.

One of the most vivid memories I have is sitting around that room having an argument about how to treat a New York Heart Association (NYHA) class for congestive heart failure as an adjustment variable in a regression model. Should we have class I to IV as some sort of linear effect, or should we not assume anything about that the impact of that NYHA score by having just three indicator variables? And then what would the reference group be? We had this big argument about what the reference would be when you are coding a four-level categorical variable, because we wanted all of our statistical results going forwards to be interpretable by the clinicians.

The generous NIH funding in those days was a luxury that we will probably never see again.

We had the luxury of arguing about those sort of things as long as we wanted to. There was a difference in those days; the pressures that clinicians were under were far different than they are today. I would say there was much less pressure because there was not the same case load, billing, resource units, and all of those things.

Another big factor was that when you would submit an NIH grant proposal and get a grant (our group was ahead of the databased way of thinking about cardiology, and we were pretty good at getting grants), the efforts that were supported by the grant were really adequate—sometimes even more than adequate—to the goals of the grant. You would have people on 25%, 30%, 40% effort sometimes.

We now have this super- competitive mode of operating, and of course, there are a lot more researchers competing for funds, The generous NIH funding in those days was a luxury that we will probably never see again. It was something to cherish, because the kind of collegial interactions that it created were truly magical.

Harrington: I think that is one of the great lessons of the Databank: Whether you are a clinician or biostatistician, you need time to devote to learning the issues germane to the field, and the two teams need time to come together. I fully agree with you; those days of being able to argue and discuss and really dissect a problem were critical in terms of moving the field forward.

As you talked about how you thought about NYHA class and how you would insert it into a model, it reminds me that one of the great achievements of the group during the '70s and '80s was thinking about how to handle nonrandomized data when making treatment inferences. That has some relevance today. Do you want to talk about how people thought about that issue?

Making Treatment Inferences

Harrell: Yes, it was controversial even then. There were early randomized trials, one that was done in the Veterans Administration system for bypass surgery versus medical therapy in obstructive coronary artery disease.[2] Even early on in the '70s or so, people realized the ability to randomize patients, especially enough patients in specific disease severity groups, where maybe the practicing community did not think there was equipoise.

We were believers in observational data, and over time, we came to believe that our trust in observational data was just a little bit too great. The success of randomized trials was maybe slightly overestimated. Somewhere in the middle, there is a truth. But there were a lot of arguments about this, and we went to a lot of conferences where people would present randomized trial data with people who would present observational data and have big arguments with each other.

Of course, some of the ones that were easiest to argue with were surgeons presenting their experience at their medical center with bypass surgery with no control group whatsoever. That was pretty easy to go after, but the surgeons didn't quite understand the weakness of that approach.

It was always controversial, and over time, we started doing very strong comparisons with observational data predictions of how patients were going to fare with the different treatments and compared and matched them as best as we could with the randomized trial data as those data were published. Mark Hlatky really led this effort. It's quite an interesting story, and never super- clear and never without controversy.

Harrington: Frank, I could keep talking with you all day. My guest has been Frank Harrell, professor of biostatistics at Vanderbilt University School of Medicine. Frank, thanks for joining us.

Harrell: Thank you for having me, I really enjoyed it, Bob.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: