Predicting the Risk of End-stage Renal Disease in the Population-based Setting

A Retrospective Case-control Study

Eric S Johnson; David H Smith; Micah L Thorp; Xiuhai Yang; Juhaeri Juhaeri


BMC Nephrology. 2011;11 

In This Article


Design Overview and Objective

We conducted a population-based, retrospective case-control study of patients who presented with ESRD for the first time (i.e., incident cases) and started RRT while they were members of KPNW. Our objective was to identify the strength and precision of clinician-recorded predictors of RRT in the population-based setting. We also calculated the population-based incidence of RRT to put the odds ratios into context for decision-makers.


We conducted the study in a health maintenance organization, Kaiser Permanente Northwest (KPNW), which serves the Portland, Oregon and Vancouver, Washington metropolitan area. KPNW has an annual membership of approximately 450,000 people. KPNW's electronic medical record, HealthConnect, has served as the sole medical record at all clinics since January 1997. KPNW as a research setting has been described in detail elsewhere.[9] The study was reviewed and approved by KPNW's human subjects committee.

Identification of Cases and Controls

Patients who developed ESRD and started RRT while they were members of KPNW were eligible to serve as cases. Patients were identified from January 2000 through December 2004 and we identified patients who were treated with chronic dialysis or had a kidney transplant. A nephrologist (MT) confirmed the renal replacement therapy and its start date by reviewing the text of patients' medical records.

We frequency matched controls (10 per case) on year of RRT as well as age and sex. Controls were randomly sampled from the source population using the case's index date (i.e., the same month the case patient started RRT). Patients with a previous diagnosis or treatment for ESRD were not eligible to serve as controls because they were no longer at risk of developing ESRD. We used incidence density sampling to estimate the rate ratio from the entire cohort.[10]

Index Dates and Eligibility

All patients were assigned an index date, which was either the date they started RRT or the date they were sampled to be a control. Risk factors for ESRD were only measured during the baseline period (1997 through 1999) to ensure that the information could have predicted ESRD. Both case and control patients met the following eligibility criteria:

  • Celebrated their 20th birthday by the index date (with no upper age limit).

  • Contributed continuous membership in KPNW since January 1999 (until the index date).

  • Maintained prescription drug coverage through KPNW since January 1999 (until the index date).

All patients were required to be members of KPNW during 1999; however, many patients were members as early as 1997 and 1998. The population-based incidence calculations used the same eligibility criteria.

Data Collection

We measured the following possible predictors of RRT using the coded information in the electronic medical record and the laboratory values (as noted for each characteristic in Table 1). We divided the characteristics into three categories: 1) demographic characteristics for matching; 2) clinical history; 3) the most recent laboratory measures during the baseline period.

Sample Size Considerations

The sample size was fixed because of the retrospective design. We observed 485 RRT starts during the period from 2000 through 2004. The effective sample size for multivariable analyses was reduced from 485 to 350 events because of missing data. One statistical approach for considering the adequacy of the sample size for predictive modelling is to consider the total number of events in relation to the number of candidate predictor characteristics and their degrees of freedom.[11] Experts recommend approximately 10 to 20 events per degree of freedom. Consequently, 350 events allowed us to consider 18 to 35 degrees of freedom.

Statistical Analysis

Our objective was to identify the strength and precision of characteristics that predict RRT. Consequently, we modeled the data using predictive methods instead of explanatory methods.[12] We appreciate that randomly allocating patients to a characteristic wouldn't necessarily cause a higher rate of RRT. We analyzed the data using logistic regression to calculate the odds ratios and 95% confidence intervals after controlling for matching characteristics: age, sex, and calendar time.

To select characteristics for evaluation in the logistic regression model, we started with clinical characteristics that are frequently and reliably measured in routine clinical practice (e.g., hypertension). We proceeded to characteristics that may be less reliably measured (i.e., greater measurement error in the ICD-9-CM coded electronic medical record), but which are probably strong predictors of the outcome (e.g., history of clinically recognized diabetes). We retained characteristics in the equations if they were statistically significant (P < 0.05).

The pattern of missing data is complicated in a retrospective study where collection depends on patients' and providers' decisions to schedule outpatient visits and to measure characteristics (e.g., order a laboratory test). When investigators cannot impute missing clinical values (because they are not missing at random), one of the less biased methods is to analyze the subgroup of patients with complete data for all characteristics, the method we adopted.[13]

To calculate the population-based incidence of RRT, we added the number of person-years that members were at risk of becoming RRT cases. We then divided the number of RRT cases by the total person-time at risk and calculated the exact Poisson 95% confidence interval.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.