Towards the Best Kidney Failure Prediction Tool

A Systematic Review and Selection Aid

Chava L. Ramspek; Ype de Jong; Friedo W. Dekker; Merel van Diepen


Nephrol Dial Transplant. 2020;35(9):1527-1538. 

In This Article

Materials and Methods

Data Sources and Searches

The current review was framed by the search for prognostic prediction models for CKD patients, predicting the future event of kidney failure. To ensure transparent reporting and accurate study appraisal, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) and Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) guidelines were followed where applicable.[8–10] The completed PRISMA checklist is provided as supplementary material. We searched the PubMed and EMBASE databases on 31 December 2017 for English-language studies regarding risk prediction in CKD patients. The search strategies were designed to include relevant development, validation and implementation studies and are provided in Appendix A1.

Study Selection

Titles, abstracts and full-text articles were sequentially screened for inclusion by two independent researchers (C.L.R. and Y.J.). Discrepancies on inclusion of full-text articles were resolved by consulting a third co-author (M.D.). Articles were included if they met the following predefined selection criteria: (i) the study must develop, validate, update or implement a multivariate prognostic prediction model, with a prediction research question as the aim, as opposed to an aetiological or methodological goal; (ii) the study must present at least one measure to assess model performance; (iii) the study population must consist of adult CKD patients and (iv) the study outcome must include kidney failure or end-stage renal disease. The references of included studies and related reviews were manually screened in order to identify additional relevant studies.

Data Extraction and Quality Assessment

Following selection, two reviewers (C.L.R. and Y.J.) independently conducted the data extraction and quality assessment. Discrepancies were discussed with input from an additional co-author (M.D.) where necessary. Conforming with CHARMS recommendations, information on the source of the data, population, outcome, sample size, missing data, model development and model performance were extracted and summarized. Additionally, data on external validations of models were extracted. Furthermore, the risk of bias and clinical usefulness were judged by both reviewers independently. In order to facilitate further comparison, studies were grouped by study population, which ranged from very broad (general CKD) to specific CKD subgroups such as immunoglobulin A (IgA) nephropathy or diabetic nephropathy. Quality and risk of bias were assessed in both development and validation studies by making use of a novel tool, the Prediction Study Risk of Bias Assessment Tool (PROBAST). Although this tool has yet to be published in its complete form, there is no other formal risk of bias assessment available that is applicable to prediction studies. The PROBAST is specifically designed for systematic reviews of prediction studies and is used as a domain-based approach with 23 signalling questions that categorize the risk of bias into high, low or unclear for five separate domains: participant selection, predictors, outcome, sample size and missing data, and analysis. It also assesses the usability of a model. It has been used in multiple reviews in the past year and was presented in part at the 2016 Cochrane Colloquia.[11] The final test version of PROBAST was obtained through personal e-mail contact with Dr R.C.G. Wolff.

Data Synthesis

Given the multitude of different models and heterogeneity in study characteristics, we opted for a narrative synthesis of results supported by extensive tables and figures with study characteristics listed per article. Model performance was evaluated by examining the discrimination and calibration of included prediction tools. Discrimination is most often described by the C-statistic and indicates how well the model discriminates between patients with and without the event of interest. It lies between 0.5 and 1, where 0.5 is similar to tossing a coin and 1 indicates perfect discrimination.[12] Important to take into account is that the C-statistic of the same model can vary greatly, dependent on the population on which the model is tested. When a population is heterogeneous in the predictors that make up the prediction tool, the C-statistic may increase substantially.[13] On the other hand, calibration describes the agreement between the absolute number of predicted events and observed events population wide. It is best represented in a plot, wherein the predicted probability of kidney failure is plotted against the observed rate of kidney failure.[12] To evaluate the sample size and risk of overfitting in development studies, the events per candidate predictor (EPV) were extracted. A minimum of 10 EPV has been suggested as rule of thumb for an acceptable sample size in model development studies.[14] For external validation studies, it has been recommended to include a minimum of 100 events in total to obtain a precise estimate of performance.[15]