Providing Guidance in the Dark: Rare Renal Diseases and the Challenge to Improve the Quality of Evidence

Davide Bolignano; Evi V. Nagler; Wim Van Biesen; Carmine Zoccali


Nephrol Dial Transplant. 2014;29(9):1628-1632. 

In This Article

How can the Quality of Evidence be Enhanced in Rare Renal Diseases?

From a statistical point of view, the number of subjects recruited in a trial should be sufficient to reject the null hypothesis (no difference in effect between the interventions) in favour of the alternative hypothesis (a difference in effect exists between the interventions) for a given level of significance. The probability of reaching a true positive conclusion is known as the study power, which can be maximized by increasing the number of participants.[5] Since enrollment of a large number of patients in rare diseases is challenging, alternative strategies to show a particular effect, if in reality there is one, have been proposed.[6] Some of these techniques can be considered 'neutral', such as (i) use of a crossover design (ii) performing repeated measurements and (iii) analysis of co-variance instead of single comparison of outcome between groups. Others are more dubious, and do not always lead to more or better evidence, such as use of composite or surrogate outcomes and interim analyses. Other 'tailored' designs have also been proposed to reduce the sample needed to reach statistical significance. In 'sequential' designs, for instance repeated statistical analyses are performed on accumulating data, and the study is stopped as soon as the information is sufficient to reach conclusions.[7] Study designs exist that do not contemplate a pre-study calculation of sample size. In this 'adaptive' design approach, modifications after interim analyses are permitted. For example, on the basis of observed results at a certain time point, the required sample size for further recruitment can be re-assessed.[8] Such an adaptive design was adopted in the PRIMO (Paricalcitol capsules benefits in Renal failure-Induced cardiac MOrbidity) study.[9] In this study, because of limited prior data to estimate sample size, an interim efficacy analysis at a pre-specified time point and conditioned on the status of enrollment was planned for eventual sample size re-estimation. The decision to increase sample size was based on the observed treatment effect of paricalcitol on left ventricular hypertrophy. However, all these 'methodological tips and tricks' are always a slippery slope, and one should always be very critical in their interpretation.

In 'open-ended' RCTs patients can be continuously enrolled until a reliable positive or negative conclusion about the treatments is made. The single-patient (n-of-1) clinical trial is another interesting opportunity in at least some diseases. This trial design is a multiple crossover study conducted in single individuals where a single enrolled patient is exposed repeatedly to different treatments in a randomized order. Unlike traditional trials, the unit of randomization is the intervention rather than the patient, and the outcome is the conclusion about the best treatment for a particular patient.[10] Hackett et al. conducted a study with this design on a patient affected by ornithine transcarbamylase (OTC) carrier deficiency, a rare X-linked disorder of the urea cycle with variable clinical manifestations in females, including end-stage renal disease. The treating physician and patient were blinded to treatment (L-arginine, an obligate OTC carrier). Either placebo capsules or L-arginine capsules were given for weekly periods. Plasma arginine and glutamine levels and changes in quality of life were considered as weekly efficacy indicators. After the end of the study a clear benefit of L-arginine compared with placebo was demonstrated for that particular patient. Other acceptable statistical approaches to trials in small populations include the adoption of less conservative statistical cut-off levels, such as P-values >5% and the use of one-side instead of two-sides tests of significance.[11] Although one-sided tests are commonly considered not trustworthy, these types of tests can be used to maximize the power of effect detection if one knows exactly which is the direction of the effect that has to be expected. For example, let us imagine we want to test a new drug able to prevent renal stones formation in patients affected by cystinosis. The drug is cheaper than the other available drugs and previous evidence shows that it was efficacious to prevent lithiasis in all cases. Under these circumstances, given that we exactly know the direction of the effect (only improvement and no worsening), it would be appropriate to use a one-sided test, i.e. a test requiring a smaller sample size when compared with a two-sided traditional test. Less conservative approaches can be applied also to the assessment of the study power, starting from the assumption that confidence intervals (CIs) of treatment effect estimation, although closely linked to the statistical significance, are much more informative than P-values. Generally, for a given treatment effect, the confidence interval (CI) is reduced as the sample size is increased. However, when the recruitment of a high number of patients is difficult, such as in rare diseases, the sample size could be reduced at the cost of a reasonable reduction in the precision of effect estimation (see Figure 1).

Figure 1.

Intervention on the precision of effect estimation when calculating the power of the study as a way to optimize the sample size in small populations. Here, we consider a hypothetical RCT on a given rare renal disease with a prevalence of 1:50 000, in which a treatment is supposed to produce a reduction of 20% of risk to develop end-stage renal disease. The study power is fixed to be 80%. Setting the confidence interval at 99% (maximum precision), we would need 600 patients to observe the above cited effect. However, the number of subjects to be enrolled can be significantly reduced (at cost of lower precision) if less narrow CIs are considered, becoming 400 at 95%, 300 at 90%, 260 at 85% and even 220 at 80% level of precision.