COMMENTARY

# Imputation -- Filling the Gaps in the Data

Henry R. Black, MD; Andrew J. Vickers, DPhil

Disclosures

May 05, 2011

This feature requires the newest version of Flash. You can download it here.

Helping Clinicians Understand Statistics

Hi, I'm Dr. Henry Black, Clinical Professor of Internal Medicine at the New York University School of Medicine [New York, NY] and immediate past President of the American Society of Hypertension. It has been troubling me of late that, as we begin to analyze new data from clinical trials, particularly those that relate to how we practice, it's getting more complicated and more difficult for an unsophisticated individual who doesn't know new ways to analyze biostatistics and to interpret trials, to be able to figure out exactly what they mean.

I'm here with Dr. Andrew Vickers from Sloan-Kettering. Andrew, would you introduce yourself and tell us what you do?

Andrew J. Vickers, DPhil: I'm Dr. Vickers, a biostatistician at Memorial Sloan-Kettering Cancer Center [New York, NY]. Prostate cancer is my specialty, but I do a little writing and my own videos for Medscape. I'm very interested in helping people learn and understand statistics, so that's what brought us together today.

What Is Imputation?

Dr. Black: Excellent. One thing that is troubling me is how we do with things like imputation, how we deal with sensitivity analysis -- things like that -- when we don't have a large trial that directly answers the question. What's your approach to that?

Dr. Vickers: Imputation; what does imputation mean? Some people dismiss it as a "statistical guess," and defenders of imputation say, "well, if you don't impute, that's a type of guess as well." What does imputation mean? Imputation means that you have a patient who didn't provide data, so we're going to make some estimate of what the data would have been had they been given to you. An obvious example: You have a set of patients in a pain trial, and at the end of 1 year you would like them to tell you how much pain they are in; it's a year after treatment, but did the treatment help control their pain? It's a big questionnaire; you call them up, and they say, "I'm done with your trial, but I'm not going to fill it in," so now you have a blank on your dataset. What are you going to do about that?

For years, the solution was just to ignore that, and some statisticians actually said, "No, that's actually a type of imputation." They are imputing that what that patient would have marked on that questionnaire is the mean: the average of their group.

Last Observation Carried Forward

Dr. Black: Is that different from the last observation carried forward?

Dr. Vickers: We can come back to "last observation carried forward"; that's a type of imputation, but that's implicit. For example, if you have a trial with 100 patients in each of 2 arms and only 10 patients in each arm fill in their questionnaires, and the average for 1 group is 5 and the average for the other group is 4 (on a 0-10 pain scale), what about those other 90 patients? You implicitly say, when you report, that if you were to just ignore them, their scores would have been 5 and 4.

Dr. Black: You can't do that.

Dr. Vickers: Right; that's implicit. Statisticians started to say: "If you don't impute, you're just imputing the mean, and that may not be logical," so we should actually try and work out what these patients would have said had they filled in their questionnaires. One of the most common methods that you see in the literature is "last observation carried forward," so in a trial, the example I gave is: You take some pain data at baseline, and then maybe the patient fills in the questionnaire at 3 months and at 6 months, but then at 12 months they don't fill in their 12-month questionnaire. Then you say, "Let's just look at their 6-month questionnaire, and say that's what they would have filled in at 12 months, had they completed it." That's last observation carried forward.

In fact, there is a lot of statistical work to say that the last observation carried forward is worse than useless; that is worse than just assuming the mean, and there are many reasons to do that. The most obvious example would be in a dementia trial in which patients worsen over time. There are different rates of cognitive decline, so when someone stops filling in their questionnaires, instead of their scores going down, they suddenly flatten out; they don't get any worse, so if you use the last observation carried forward, you are imputing no change in that patient, but that's not necessarily rational.

Dr. Black: There is always likely to be a change.

Using Statistical Models

Dr. Vickers: There is always going to be change, so statisticians create statistical models. They say, "We know how old the patient is; we know what their baseline pain score was; we know all this information about the patient; we can use that information; we can put that in a statistical model to try to work out what, if it's a pain trial, what they would have written on their questionnaire." I'm involved in a lot of biomarker studies, so had we measured the biomarker, what would it have been? We use statistics to make a best guess about the data would have been had we received them. There are various levels of complexity, but the best methods take into the account the fact that it's a statistical guess. A very naive way of doing it would be to create your statistical model and then say, "Mr. Brown has a pain score of 7." A more sophisticated way would be to say, "We know that's just a guess, so in our analysis, if Mr. Brown actually wrote 7, we'll treat that one way, and if we're guessing he wrote 7, we'll treat that in a slightly different way."

Dr. Black: Dr. Vickers, thank you very much for your input. I think it's so valuable right now when we're trying to know what to do with a patient, what to tell a patient, and what a patient is supposed to interpret from what is read that we really understand how this happens. We appreciate your time.

Dr. Vickers: Thank you; it was fun to be here.