Predictive Probability of Serum Prostate-Specific Antigen for Prostate Cancer: An Approach Using Bayes Rule

Robin T. Vollmer, MD

Disclosures

Am J Clin Pathol. 2005;125(3):336-342.

Materials and Methods

Although the sensitivity and specificity of the PSA level for prostate cancer are common topics of concern, the conditional probability of greatest interest to the patient and his physician is the PPV. The PPV is defined as the probability of disease, given that the laboratory test result is positive. For serum PSA, PPV is the conditional probability P(Ca | PSA > x), where x represents the cut point in serum PSA used to define a positive result. Bayes rule[16] tells us that PPV can be written as Equation 1:

On the right side of Equation 1 are 4 probability terms: P(PSA > x | Ca) is the sensitivity of the serum PSA level; P(Ca), the prior or underlying probability of prostate cancer, that is, the probability without reference to serum PSA; P(PSA > x | B9), the false-positive probability (FP) of PSA (ie, FP or 1 - specificity); and P(B9), the prior or underlying probability of not having prostate cancer, which can be written as 1 - P(Ca).

Dividing the numerator and denominator of the right side of Equation 1 by P(PSA > x | Ca) * P(Ca) and substituting sensitivity, FP, and 1 - P(Ca) where appropriate allows Equation 1 to be simplified to Equation 2:

Thus, if one can estimate FP, sensitivity, and P(Ca), then one can use Equation 1 to estimate the PPV.

To estimate sensitivity, I used previously published results of serum PSA levels from patients with prostate cancer. First, I consolidated the raw sensitivity data published in 4 large studies, each of which gave sufficient details by values of serum PSA from 0 to 20 ng/mL and patient age in 3 groups: 50 to 59, 60 to 69, and 70 to 79 years.[17,18,19,20] Then I summed the numerators and denominators of the raw counts of patients at each value of serum PSA, so that the final plots of sensitivity vs serum PSA were derived from more than 2,700 men with prostate cancer. To obtain a continuous expression of sensitivity as a function of serum PSA, I used a nonlinear least squares algorithm[21] to model the consolidated sensitivity data using the sum of a gamma distribution function and an exponential distribution function Appendix 1.

I modeled the FP from previously published data for serum PSA levels from patients without prostate cancer. Specifically, I consolidated the raw false-positive data from 10 previously published studies, each of which gave sufficient details to form distribution functions by values of serum PSA from 0 to 20 ng/mL and patient age for the 3 age groups: 50 to 59, 60 to 69, and 70 to 79 years.[17,18,19,20,22,23,24,25,26,27] When the published specificity data were limited to fewer values of serum PSA, I first verified that the published data for FP followed an exponential distribution and then used an exponential fit to extrapolate for the unpublished values of serum PSA. As before, I summed the resulting numerators and denominators of the counts of patients at each value of serum PSA, so that the final plots of FP vs serum PSA level were derived from more than 99,000 men without prostate cancer. To obtain a continuous expression of FP as a function of serum PSA, I used a nonlinear least squares algorithm[21] to model the consolidated false-positive data with an exponential function (Appendix 1).

Although previous publications of PPV for serum PSA values have used values of P(Ca) obtained directly from their data, it is preferable to obtain P(Ca) from a broader population. Otherwise, the resulting PPV might not be generally applicable. The National Cancer Institute-sponsored Surveillance, Epidemiology, and End Results (SEER) data provide estimates of P(Ca) for a broad population, and from its Web site (http://canques.seer.cancer.gov/) devcan2001 program, I obtained estimates of the cumulative incidence of prostate cancer for men of ages 55, 65, and 75 years, respectively, as 0.959%, 5.015%, and 11.947%. Dividing these values by 100 converts these percentages to probabilities, that is, to values between 0 and 1.

Equations 1 and 2 are formulas for the PPV for prostate cancer, given that serum PSA level exceeds a certain threshold value, x. Consequently, these formulas can be used to evaluate cut points in serum PSA levels to be used for further clinical actions such as biopsy of the prostate. Of equal interest, however, to a patient with a particular level of serum PSA is the PPV for a more narrow range of serum PSA levels. Suppose, for example, we consider the interval of values in serum PSA between x1 and x2. We symbolize this interval as I, which is written as:

I = x1 < PSA ≤ x2

Bayes rule tells us that the PPV of prostate cancer given that PSA falls in the interval I can be written analogous to Equation 1 as Equation 3:

The rules of probability and simple algebra indicate that the PPV for an interval of PSA values can be simplified to Equation 4: