Unlike Shakespeare's plays, the mammogram debate unfolds as a slightly different script with the same actors at the same locations. The latest entry from Welch and colleagues shows that since the widespread adoption of screening mammography, there has been a greater increase in the incidence of small breast tumors than a reduction in large breast tumors. In a successful screening program, they rationalize, any reduction in the number of large tumors discovered should be balanced by a corresponding increase in small tumors identified. If screening results in more small tumors found than large tumors forfeited, there's overdiagnosis.
In answer to the question of whether screening mammography saves lives or overdiagnoses, the researchers confirmed that it does both. Their neutral position will be contentious, though. The amount of overdiagnosis they report is higher than previous estimates. For 30 fewer large tumors (> 2 cm), screening found 162 more small tumors (< 2 cm). Using their simple framework, assuming every reduction in large tumors in the population reduces breast cancer mortality and is attributable entirely to mammography, for every life mammogram saves, four women are overdiagnosed.
It's unlikely that radiologists specializing in breast imaging will find these estimates believable, let alone palatable. The idea that 4 out of 5 invasive breast cancers, less than 2 cm, in 45-year-old women are overdiagnosed—meaning that even if these women live to 80, the detection and treatment of these cancers has been forlorn—will be hard to fathom. A figure in the paper shows that between 1975 and 1979, before screening mammograms were adopted, the 10-year risk for death in invasive tumors measuring 1-1.9 cm was nearly 20%. If these nonpalpable small invasive tumors were not innocuous in 1975, why would they be innocuous today? The corollary of Welch and colleagues' analysis is that invasive cancers remain quiescent or regress, and though some cancers, notably neuroblastoma, regress, regression is contentious in breast cancer, in part because it is so hard to show. To show regression you'd have to withhold treatment. Good luck with that.
Overdiagnosis is a controversial issue in breast cancer, partly because epidemiologists and radiologists publish in silos which have become fiefdoms at perpetual war with each other. Neither side has a Marcus Aurelius. When epidemiology and cancer biology are at odds, rather than reconcile with one other, the temptation is to prove the other wrong. But mostly the entity is contentious because overdiagnosis is difficult to study and quantify.
The ideal way to quantify overdiagnosis is a randomized controlled trial (RCT) powered for a realistic effect size, where the participants are followed for at least 30 years, in order to tease out lead-time bias. But in 30 years' time, testing and treatment will improve, invalidating the trial. Meanwhile, you have to make a decision about screening.
To facilitate that, researchers use databases such as SEER (Surveillance, Epidemiology, and End Results), which records tumor size, node involvement, and treatment, but doesn't record whether the patient had a mammogram—a limitation if you wish to precisely quantify overdiagnosis and survival benefits from screening. But SEER allows you to see trends and to make inferences from the trends, so long as you take the inference with a grain of salt or, in statistical terms, place a wide confidence interval around the point estimates.
My bone of contention with Welch's analysis isn't the use of SEER but two assumptions related to the natural history of breast cancer. It's logical that large breast tumors that kill were once smaller tumors, and that small tumors can become large tumors, but it doesn't logically follow that a small tumor needs to be a large tumor before it metastasizes and kills. This means that the excess small tumors uncovered by screening is not necessarily overdiagnosis.
For the second assumption, we have to revisit lead-time bias, which is when screening finds a tumor before it is clinically evident but the patient still dies from the tumor. This gives the patient a spurious survival benefit because although the patient lives "longer" knowing they have the tumor, they die at the same time they would have if the tumor had never been discovered with screening. Crucially, in lead-time bias, the tumor progresses. In overdiagnosis, the tumor is not supposed to progress; it remains indolent or regresses. Is overdiagnosis merely lead-time bias that hasn't revealed itself? This is a crucial question. If the excess small cancers never progress, the overdiagnosis rate will be higher. If the excess small cancers merely reflect lead-time bias, the overdiagnosis rate will be lower. The entire mammogram debate can be reduced to this question.
How overdiagnosis is inferred doesn't help the distinction between overdiagnosis and lead-time bias. Statistically, we know that cancers are overdiagnosed when they don't kill the patients. Consider Mary, in whom a small tumor is found by screening in middle age. The tumor is resected and Mary dies at 75 in a car accident. Mary's counterfactual in a parallel universe, Sally, has a fungating breast mass that responds dismally to chemotherapy. She is like Mary in every other respect and dies at 75 in a car accident. Is this overdiagnosis or lead time? How would you make this distinction in a database?
Overdiagnosis Is Real
Nevertheless, there are takeaways from Welch's study. Regardless of the precise amount of overdiagnosis, it is indisputable that overdiagnosis exists. To deny overdiagnosis in breast cancer screening is to deny that the earth is an oblate spheroid. The best estimate for overdiagnosis is the Canadian Breast Screening Study which, though far from perfect, followed women for a long time and in which there were 20% excess cancers in the screening arm. My hunch is that the overdiagnosis rate is 20%-30 %, not 80%.
Overdiagnosis is not failure to diagnose but failure to prognosticate. The emphasis in mammography has been in reducing false positives—ie, on diagnostic efficacy. In Fryback and Thornbury's model of efficacy of a diagnostic test, diagnostic efficacy is only the second stage of a six-stage pyramid. Breast imagers have focused on reducing false positives partly because the US Preventive Services Task Force (USPSTF), among others, have lamented that women are hurt by "anxiety from false positives"—a laughably absurd contention in a nation obsessed with health and where public health operates by yanking people's anxieties of dying off the scale.
The focus instead should have been on the overdiagnosis-overtreatment complex for which radiologists and pathologists, unified, worked with oncologic surgeons. As Welch and colleagues point out, screening is anatomical based, but tumor behavior is molecular based. Screening is like the teacher who catches the class troublemaker, but who also impugns the innocent nerd for something he doesn't plan on doing. While the distinction between anatomy and molecular pathology is nifty, it's hard to see how to dispense with anatomy. Anatomy is the gateway to pathology. You can't snip body parts randomly; anatomical imaging tells you where the pathology is located. Perhaps liquid biopsy, which detects tumor DNA in the blood, will dispense with anatomy when we can precisely (still empirically) treat tumor. But we're not there yet. Size matters bluntly even though size is futile if we seek precision.
We Need an RCT, We Can't Wait for an RCT
The only way to unravel the mystery of what happens to small invasive breast cancers is an RCT. It might be easier finding out what came before the multiverse (my money is on Om, followed by Brahma followed by Vishnu). In the RCT, women with tumors < 2 cm would be randomly assigned to "active treatment" and "watchful waiting" arms. "Watchful waiting" is hardly an alien concept; urologists have been watching prostate cancer wistfully for eons. UCSF surgical oncologist Laura Esserman, MD, presents that option to her patients with low-grade DCIS, and there are such trials beginning in Europe.
It would not be easy to conduct this RCT in the United States. Imagine consenting women for the trial: "We don't know if this invasive tumor, the one we've been frightening you to death about, will progress or regress. Can you sign at the bottom? Thank you." The trial would be dead on arrival. The crossover rate from the "watchful waiting" to the "active treatment" arm would be so high that it would bludgeon the conclusions. And the study would be decimated when one woman in the "watchful waiting" arm has an aggressive tumor that metastasizes everywhere before the next surveillance mammogram. Statistically speaking, this will happen. However, such a trial can succeed in less risk-averse places such as Europe and Australia.
So we're left with the same questions after Welch's study as we were before Welch's study: What is the precise overdiagnosis rate and precise survival benefit of screening mammograms? I will ask the question differently: What is the maximum overdiagnosis rate we'll tolerate and the minimum survival benefit we'll accept to continue with screening mammograms? Addressing my question, whether by the USPSTF, public policy makers, mammogram advocates, screening skeptics, or professional organizations, requires courage. Playing table tennis with empiricism is a good way of avoiding tough questions.