# How to Win the Marathon: A Common Statistical Error Can Help

Andrew J. Vickers, PhD

Disclosures

September 18, 2008

A friend of mine is an avid runner who trains hard to post a good time in the New York City Marathon. Indeed, I remember that we once headed back from the beach early because she had planned a 20-mile run and didn't want to throw off her training schedule. Being a statistician, I know a much easier way to shave off a couple of minutes from a race: Just don't start your stopwatch until you've been running for a bit.

Now if that sounds like cheating, just consider the following graph, which is sometimes seen in papers describing cancer studies. This shows survival time separately by tumor "response," defined as whether or not the tumor shrank by at least 50% after treatment.

Figure.

Tumor response.

The graph suggests that patients who have a response (grey line) live longer than those who do not respond (black line). If we compare groups statistically, using what is known as a log-rank test, we get a P value of .025, from which we might well conclude that yes, if you respond to chemotherapy, you'll live longer.

The problem is that you can't have a response if you die before treatment, so anyone who dies early is automatically defined as not responding. Indeed, if you look carefully, you'll notice that no responders die until a couple of months after treatment has started, which is when patients start to get scans to see whether their tumors have changed in size. Now, I know for a fact that response is not associated with outcome in this dataset because I created it artificially using random numbers, altering the data so that any (imaginary) patient who died before 2 months was counted as not responding, even if the patient would have responded had he or she lived.

Avoiding this type of analysis is a routine battle in my day-to-day work. Here is a recent example: A lab scientist contacted us, all excited, saying that he had identified a molecule in the blood that could predict cancer recurrence, and here was the statistical print-out to prove it. When we looked at his analyses closely, we saw that he had "started the clock" at the time of the patient's first chemotherapy treatment. But many patients had disease recurrence before chemotherapy was over, and thus also before when the blood was measured. In other words, our collaborator was trying to predict something that had often already occurred.

Two other examples of starting the clock at the wrong time are so common and problematic that they have been given their own special names: "intent to treat" and "lead-time bias." As a quick illustration of the "intent to treat" problem, let's say that we wanted to see whether surgery was effective for heart disease; we thus looked at time from diagnosis to death in a cohort of men, some of whom were surgically treated. In this analysis, any patient who died before a scheduled operation would be in the "no surgery" group, making it look as though surgery is effective (indeed, the survival curves would look just like the figure above).

In regard to the "lead time" bias problem, take the case of an incurable disease that inevitably leads to death within a year or so. Now imagine that someone invented a special blood test that could detect the disease 5 years early, although it was still untreatable. If we did a study, we would find that patients diagnosed by the blood test live 6 years, compared to only 1 year in patients diagnosed clinically. This would appear to show that the blood test was helpful. But of course it is no better than cheating in the marathon.

"Intent to treat" and "lead time bias" will be discussed in future articles.