A Reality Check for Overdiagnosis Estimates Associated With Breast Cancer Screening

Ruth Etzioni; Jing Xia; Rebecca Hubbard; Noel S. Weiss; Roman Gulati

Disclosures

J Natl Cancer Inst. 2014;106(12) 

The frequency of overdiagnosis associated with breast cancer screening is a topic of controversy. Published estimates vary widely, but identifying which estimates are reliable is challenging. In this article we present an approach that provides a check on these estimates. Our approach leverages the close link between overdiagnosis and lead time by identifying the average lead time most consistent with a given overdiagnosis frequency. We consider a high-profile study that suggested that 31% of breast cancers diagnosed in the United States in 2008 were overdiagnosed and show that this corresponds to an average lead time of about nine years among localized cases. Comparing this estimate with the average lead time for invasive, screen-detected breast cancers of 40 months, around which there is a relative consensus, suggests the published estimate of overdiagnosis is excessive. This approach provides a novel way to appraise estimates of overdiagnosis given knowledge of disease natural history.

Overdiagnosis because of breast cancer screening is controversial, with estimates of its frequency varying greatly.[1,2] An overdiagnosed cancer is one detected by screening that would not have presented clinically during the patient's lifetime in the absence of screening, ie, the patient would have died from other causes with preclinical disease. Although there is no consensus regarding the most reliable estimates, an influential article[3] estimated that, in 2008, 31% of breast cancers among women older than age 40 years in the United States were overdiagnosed. This estimate was based on the excess incidence in 2008 relative to a projection of incidence in the absence of mammography.

Here we propose an approach for judging the plausibility of overdiagnosis estimates like this one, leveraging the link between overdiagnosis and lead time (LT), which is the time by which screening advances detection. For a patient, overdiagnosis occurs when the time to other-cause death is less than her LT. In a population, the fraction overdiagnosed is determined by the distributions of LT and other-cause survival. If we know the distributions of LT and other-cause survival, we can infer the chance of overdiagnosis. Similarly, if we know the chance of overdiagnosis and other-cause survival, we can infer the LT distribution.

Why is this useful? Because, for invasive breast cancers, the average LT is two to four years based on statistical models[4–9] fit to individual-level screening data. In accordance with a recent[10] summary estimate, we use a mean of 40 months as a consensus value.

Checking whether an overdiagnosis estimate is consistent with a consensus LT provides a check on the plausibility of the estimate. Here, we apply this approach to the estimate of 31% overdiagnosed in 2008. Among breast cancers in the Surveillance, Epidemiology, and End Results (SEER) registry in 2008, 22% were (SEER historic stage)[11] in situ, 49% were localized, and 29% were advanced (regional or distant).[12] It is likely that few advanced cancers are overdiagnosed; we assume that none were overdiagnosed in 2008. We also assume initially that all in situ cases were overdiagnosed. The remaining overdiagnosed cases must be localized, amounting to approximately 18% ([31−22]/49) of localized cases. We aim to identify the average LT yielding an overdiagnosis frequency of 18% for these cases. Assuming that all in-situ cases are overdiagnosed is conservative, minimizing the overdiagnosis frequency among localized cases and lowering the corresponding LT.

We estimate other-cause survival using SEER*Stat[13] given the age distribution for localized cases diagnosed in 2008. Because many localized cases have been screened, they have a lower risk of noncancer death than the general population.[14] For each year post diagnosis, we compute the ratio of the observed risk of other-cause death (O) to the expected risk of death in the age-matched population (E) ( Table 1 ) and use this to derive a hazard ratio (HR) to adjust US life tables for this case population. The estimated HR is 0.75 for localized cancers; ie, the annual risk of death is 25% lower among these cases than the age-matched female population.

Competition between times to clinical diagnosis and other-cause death is implemented via simulation. We generate a virtual population of women with ages as in SEER localized cases diagnosed in 2008. For each woman we simulate two times: time to other-cause death (D) and LT with a specified mean (M). Then we compute the percent overdiagnosed as the empirical fraction of women with D < LT. We vary M to find the value that yields 18% overdiagnosis. We first assume that the LT follows an exponential distribution and also allow distributions with more and less extreme lead times to represent differing frequencies of indolent cancers.

Table 2 provides mean LT for a range of overdiagnosis frequencies. Under an exponential LT distribution, for 18% of localized cancers to be overdiagnosed, the mean LT must be approximately 108 months. This increases to 136 months if only 90% of in situ cases are overdiagnosed, implying that 23% ([31-19.8]/0.49) of localized cases must be overdiagnosed. The mean LT for localized cases that yields 18% overdiagnosed is longer under Weibull (shape = 0.5) and slightly shorter under Weibull (shape = 2.0) than under the exponential.

These lead-time estimates apply to all cases; the corresponding lead times among screen-detected cases will be higher, because cases that are not screen detected have a lead time of zero.

Under all settings, the average LT most consistent with 31% of cases overdiagnosed markedly exceeded 40 months. However, since the 40-month estimate applies to all invasive cancers detected by screening, including advanced cancers, we need an estimate that pertains only to localized cancers. Noting that approximately 25% of invasive screen-detected cancers are advanced[15] and assuming that the LT among advanced cancers is short (about six months) implies that we should be comparing our results against an estimate of 51 rather than 40 months (since 40 = 0.75×51 + 0.25×6). However, even this value is much lower than the mean lead times from Table 2 .

A limitation of our study is that we use standard distributions for the LT. It is possible that some screen-detected cancers would never progress. These cancers will effectively have infinite lead times. The lead-time studies cited[4–9] generally do not explicitly separate these cases from those with a defined, finite LT; rather, these studies specify a single distribution that accommodates longer as well as shorter lead times. Our LT distributions follow these precedents so that our estimated lead times can be compared with the literature.

We conclude that an overdiagnosis rate of 31% among all breast cancer cases in 2008 seems excessive. This may be because of the use of excess incidence, which often yields an overestimate.[1,2] The same reasoning can be applied to examine the plausibility of the estimate of 22% overdiagnosed among invasive cancers in the Canadian breast cancer screening trial.[16] It is commonly believed that excess incidence estimates from clinical trials are a gold standard for estimating overdiagnosis. However, as in population studies, excess incidence estimates from trials can also produce biased results.[17]

processing....