Towards the Best Kidney Failure Prediction Tool

A Systematic Review and Selection Aid

Chava L. Ramspek; Ype de Jong; Friedo W. Dekker; Merel van Diepen


Nephrol Dial Transplant. 2020;35(9):1527-1538. 

In This Article


This systematic review provides an overview of all development and validation studies of predictive models for progression of CKD to kidney failure. Since the last reviews on this topic, the number of publications has more than doubled.[3] Most included studies report high model performance measures, implying that calculating an individual's risk of renal failure with high accuracy is attainable. This is further emphasized by the similar predictors included in various models. There were, however, substantial shortcomings in many publications. As in many medical prediction studies, aetiological and prediction goals were often confused, limiting interpretability and applicability.[7,58] First, more than half the tools provided insufficient details to calculate an individual's prognosis of kidney failure, rendering it useless for its intended purpose. Second, the clinical relevance of many models is limited due to the selection of the derivation population. Third, a high risk of bias was observed across studies, mainly due to the high risk of overfitting, inadequate handling of missing data and incomplete reporting of performance measures. Fourth, sufficient validation was largely lacking, increasing research waste and limiting the reliability of models. And finally, not a single impact study on the effect of clinical uptake has been performed. It is therefore not surprising that clinical uptake of models remains sporadic and guidelines on which model to use are lacking.

Providing absolute evidence for the single 'best' prognostic tool to use is complicated by differences between studies, mainly concerning varying study populations, use of different prediction baselines, use of varying time frames and multiple outcome definitions. A selection guide including all usable models is presented that may assist clinicians and patients in choosing the tool appropriate to their setting (Figure 5). There are many factors to take into account when selecting the most appropriate model, depending on the user's wishes and specific clinical setting. Users should be wary of overfitting in models developed on a small sample size and we would advise against the use of these models unless validated in a sufficiently large sample. Based on our results, we would advise the use of a tool with an overall low risk of bias that has shown good performance in external validation in a similar population to the population in which the use is intended and has ideally been assessed in an impact study.

For kidney failure prediction in a general CKD cohort with Stages 3–5 patients, we would recommend the four- or eight-variable KFRE, as it has been externally validated extensively for a time frame of 2 and 5 years. Although the development study potentially introduced bias by selecting predictors that were recorded up to 365 days after prediction baseline and by using univariate analysis to select predictors, the model has shown consistently good performance in CKD Stages 3–5 patients from less-biased external validation studies.[18,34] Alternatively, for 5-year predictions, the Kaiser Permanente Northwest (KPNW) model as updated and externally validated by Schroeder et al.[24] also has great potential, mainly due to its methodological rigor and low risk of bias, although it is less easy to use than the KFRE. Various other general CKD models showed promising results in development but should be further externally validated to ensure consistency of performance before clinical use.[26,28,32] For prediction of disease progression in IgA nephropathy patients, a large number of models are available. However, these models, were generally developed on a small sample size and often had a high risk of bias. The most evidence on validity was found for the risk scores developed by Goto et al.[51] and the ARR (by Berthoux et al.[48]). The Goto score contains some risk of bias due to a complete case analysis and univariate selection of predictors, but was developed on a relatively large sample size and has been externally validated twice. Although the ARR score was developed using questionable model building methods and with incomplete reporting of performance, this score has been externally validated the most times, and a recently updated version presented by Knoop et al.[21] shows great potential.

Clinical relevance proved to be largely lacking for many of the included models in the current review. Specifically, models for general CKD patients were often developed on prevalent patients with a wide range of disease severity and did not specify a specific time point when the model should be used. Prediction of renal failure can be extremely accurate when using a population with GFRs ranging from 10 to 60 mL/min/1.73 m2. However, in practice, such tools would probably be employed for a more homogeneous group of patients in which it is clinically relevant to discuss prognosis. The predictive capacities of the model would be lower in such a population. This is exemplified in the KFRE validation performed by Peeters et al.,[17] where the area under the curve of the four-variable KFRE dramatically decreased from 0.88 in the whole population (CKD Stages 3–5) to 0.71 in the more relevant population of CKD Stage 4 patients. Another factor limiting usability and interpretability is that the number of studies did not define the prediction time frame. Finally, the definition of outcome differs between studies. The use of composite endpoints is particularly problematic, as it limits the value of the model for clinicians, as each separate endpoint requires different interventions. In conclusion, an ideal model is developed for one clearly defined clinically meaningful and objective endpoint in a population for which prediction is clinically relevant. Few models included in this review met these recommendations and this lack of clinical relevance could be a large contributor to the slow uptake seen in practice.

Despite the limited uptake and discussed shortcomings of existing tools, risk prediction models for kidney failure have a great potential for improving patients' decision making, treatment and overall health. In future studies there is the need for improvement of the quality of reporting and methodology used. As the majority of models included had a high risk of bias, these models should not be implemented unless their validity is proven in unbiased external validation studies. Hopefully efforts such as the TRIPOD guidelines will correct these inadequacies and result in more robust, usable and unbiased prognostic tools.[9] To limit research waste and improve clinical uptake, it is of crucial importance that development studies provide enough model information (formula/score with absolute risk table) to enable their use. For specific renal diseases and homogeneous patient populations, there certainly appears to be room for improvement in model development. For populations in which multiple models are available, we advise that future research should focus on the updating, validation and implementation of these existing prognostic tools. Previous studies have shown that the combination of well-established clinical risk factors and kidney disease markers can accurately predict renal failure in a general CKD population. Therefore one might advise focusing resources on updating models for more clinically relevant populations in an unbiased fashion. In this step, external validation of multiple models in the same population is of key importance. Additionally, translation of mathematical model formulas to simple tools such as web calculators and enabling automated uptake is of great importance for integration into daily clinical routine. Ultimately, impact studies will be necessary to determine whether the implementation of such tools truly improves patient outcomes. Ideally, such impact studies would be randomized controlled trials and would assess the effect of implementing a prediction model in clinical practice. Different outcomes might be considered as endpoints in such studies, partly dependent on the time of prediction. Relevant outcomes might be timely referral to nephrologists, timely placement of vascular access, better informed patients, improved quality of life and possibly even improved survival.

The current review has a number of strengths. First, we expect to have included a complete overview of existing models. Furthermore, this is the first study on kidney failure models to perform a formal risk of bias assessment aimed specifically at prediction research. The study is limited by the inclusion of only English-language articles. Also, the differences in case mix and characteristics of included studies make it difficult to directly compare their performances. Herein we are limited by the lack of validation studies that compare multiple models in the same cohort. Finally, we limited the scope of this review to models predicting kidney failure, although other outcomes such as death or cardiovascular events may also have significant clinical value.

In conclusion, this study provides a systematic overview of existing models for predicting progression to kidney failure in CKD patients. The results may be used as a tool to select the most appropriate and robust prognostic model for various settings. Finally, we hope the current review motivates researchers in this field to decrease the generation of new models and combine efforts to explore, analyse and update existing models in clinically relevant settings to ultimately stimulate clinical uptake and improve patient outcomes.