# Surviving Survival Analysis: Why Censoring Is No Bad Thing

Andrew J. Vickers, PhD

Disclosures

June 13, 2008

Fine wines mature over time; fraternity parties often feature immature behavior. When statisticians use the term "mature," however, they probably aren't referring to a leisure activity, but something like data on cancer deaths.

Cancer data are said to mature because we want to know how long patients live, and we don't know exactly how long they live until they die. As time passes and patients die (statisticians refer to this rather euphemistically as "events accruing"), we get data on how long they survived and can do some statistical analysis.

As a simple example, imagine that in a clinical trial we treat 2 patients with advanced cancer on the same day, one with veryplatin and the other with ratherplatin. If we check back after 1 week, neither patient has died, and you have no information on which chemotherapy agent works best. Two years later we check again and find that the data are mature: The patient on veryplatin lived for 12 months, whereas the patient receiving ratherplatin died after only 6 months. As a result, we have some evidence to contribute to the trial analysis that veryplatin is more effective.

Generally speaking, we can't delay analyzing our results until all patients have died -- in a disease, such as prostate cancer, this may take 10 or 20 years -- and so we need some statistical methods for calculating survival if some patients in a study are sill alive.

To illustrate these methods, we use the case of 6 students enrolling in a statistics course (say, on survival analysis). The dates that each student joins and leaves the course are shown in the Table . We'll imagine that we close our study on December 1 to analyze the results: You'll notice that Bob, Erica, and Peter are still in the course at that point. So we know that they "survived" the course for 6, 4, and 1.5 months, respectively, but we don't know how much longer they would have stuck it out. In statistical speak, we describe Bob, Erica, and Peter as "censored." This is indicated in the study database by having 2 separate variables, one showing the length of follow-up (eg, 1.5 months) and the other "status" variable showing whether the study participant had survived until last follow-up (eg, 1 for dead and 0 for alive).

The way to analyze these data is to think in terms of probabilities. (Statisticians often refer to survival analysis in terms of "cumulative probability.") It is easy to see that the probability of surviving 2 weeks is 100% (as all 6 of our students lasted at least that long) and that the probability of lasting 1 month is 5 out of 6 or 83% (only Joe didn't make it further than 1 month). The trick comes in when trying to work out the probability of lasting 2 months. First, you work out the probability of surviving the first month, which we already did (83%). Then, you work out the probability of getting through the second month given that you got through the first month. (We don't know whether Peter would have survived the second month, but we know that Bob, Erica, and Paul did and Mary didn't, so that is 3 out of 4 or 75%.) Finally, you multiply the 2 together: 83% x 75% gives 62.5%.

Now here is something interesting: The probability of lasting 3 months given that you lasted 2 months is 100%. Therefore, the probability of surviving 3 months in total is the same as the probability of lasting 2 months, 62.5%. This is why a survival "curve," a plot of survival probabilities against survival time, isn't a curve at all but a series of steps (Figure).

Kaplan-Meier survival estimate.

If you don't like fractions and multiplication, don't worry; the message here isn't in the math but in how to report and interpret the numbers. First, you can't talk about "rates" for most medical studies; it isn't that "20% of the patients were alive at 5 years" or "38 patients (23%) died during the study"; you have to say something like "the probability of surviving 5 years was 20%." Second, don't put too much faith in the right-hand "tail" of the survival curve. It looks as though the probability of sticking with the course for 6 months is about 40%, but only 1 student, Bob, actually stuck it out that long.

Incidentally, Bob didn't really like being censored, but admitted that it was better than the alternative.