What Does It Mean for a Recommendation to Be Evidence-Based?

Robert L. Schmidt, MD, PhD, MBA; Rachel E. Factor, MD, MHS


Lab Med. 2019;50(1):5-7. 

Clinical laboratories are under increasing pressure to demonstrate value.[1] Traditionally, value has been defined in terms of performance measures that are under the direct control of the laboratory, such as analytical performance, cost-efficiency, and operational performance (eg, turnaround time). However, laboratories can have a significant influence over the entire diagnostic process, from preanalytical test selection to postanalytical test interpretation, all of which can have a significant impact on value. For that reason, there is increasing scrutiny on the role of the laboratory role in the diagnostic process.

Recent studies have shown that there is wide variation in diagnostic practice, which is potentially harmful. This variation can be minimized if laboratories follow evidence-based recommendations.[2,3,4]

What do we mean by evidence-based recommendations and how do we arrive at them? The term evidence-based refers to a decision process following a theoretic framework that determines relevant outcomes, prioritizes their importance, estimates the probability that each outcome will occur, estimates the costs and benefits of each outcome, evaluates whether the benefits of an intervention outweigh its harms and, finally, uses this information as a rationale for 1 or more recommendation(s) regarding the intervention. All the steps in the process should be systematic and transparent so that they can be evaluated.

Evidence-based decision making is not strictly algorithmic and recognizes that each situation is unique. Evidence-based recommendations provide a starting point that can be modified by the clinical context. Thus, evidence-based decisions must incorporate patient preferences and clinical judgment.

There are many systems for producing evidence-based recommendations, which has led to confusion about the meaning of the term evidence-based. Recently, the GRADE (Grading of Recommendations Assessment, Development and Evaluation) working group developed a systematic and explicit approach for developing evidence-based guidelines[5] that has been widely adopted. Our discussion will focus on the GRADE approach.

Although tests are used to provide information for a wide range of purposes (eg, diagnosis, prognosis, screening, and monitoring), the GRADE process is flexible and can be applied to developing recommendations for all of these purposes. Also, the GRADE process is applicable to all medical interventions and is not limited to clinical laboratory tests.

The first step in the process is to formulate the problem, usually as a question, and identify the important outcomes.[6] Clinical problems can be described using the PICO (population, intervention, comparison, outcome) framework modified for testing (population, index test, comparator or reference test, and outcome). This step is critical because it focuses the task and determines the type of evidence that is required.

The second step is to gather evidence. Data gathering should be systematic, comprehensive, and reproducible, following the well-established procedures for a systematic review.

The next step evaluates the quality of the evidence, which often depends on study type. Filtered evidence (systematic reviews and meta-analysis) is generally considered to be of higher-quality than unfiltered evidence (individual studies). Among individual studies, controlled studies are considered to contribute stronger evidence than observational studies which, in turn, are considered stronger than expert opinion, case studies, and narrative reviews. Randomized controlled trials (RCTs) are more reliable than observational studies, which, in turn, are more reliable than cohort and case-control studies. The hierarchy of evidence is primarily based on susceptibility to bias (eg, spectrum bias, operational bias, selection bias). Systematic reviews of RCTs are at the top of the hierarchy because they are least susceptible to bias. However, the design and execution of each study can influence susceptibility to bias, so having a method to critically appraise a study is important.

Scoring tools are now available (eg, QUADAS-2 [Quality Assessment of Diagnostic Accuracy Studies] for diagnostic accuracy studies) to guide the critical appraisal of many types of studies; however, these tools require that studies fully report the methods by which the results were obtained. This need has led to the development of reporting guidelines (eg, STARD [Standards for Reporting of Diagnostic Accuracy Studies] for diagnostic accuracy studies and REMARK [Reporting Recommendations for Tumor Marker Prognostic Studies] for prognostic biomarker studies) that list items that should be reported, to allow for critical appraisal. A full list of reporting guidelines is available at the EQUATOR website.[7]

Many journals now require authors to demonstrate that they have followed an appropriate reporting guideline. Still, some disciplines have been slow to adopt reporting guidelines, and adherence is often poor even when journals require them.[8–10]

Quality refers to the certainty of evidence. High-quality evidence is unlikely to be changed, whereas lower-quality evidence may change with future research findings or may not be applicable in different settings. Quality is initially based on the study type and modified with additional considerations. Quality is ranked lower if there are threats to bias, inconsistency between studies, indirectness, imprecision, or publication bias.

Indirectness refers to the applicability of the available evidence to the clinical question and is assessed by determining the relevance of the PICO parameters to the clinical problem at hand.[11] Quality is ranked higher if there is a large effect size.

The GRADE system rates the quality of a body of evidence for a specific outcome using 4 categories (very low, low, moderate, and high). The grading process is transparent and, to the extent possible, standardized and objective. In general, different decision makers should reach similar conclusions regarding the quality of evidence for an outcome. See several publications for a more detailed explanation of the GRADE process for evaluation of evidence.[5,11–14] In particular, we draw attention to 2 articles that focus on strength of evidence for diagnostic recommendations.[15,16]

The strength of recommendation depends on the balance between desirable and undesirable outcomes. This assessment depends on relative effect sizes, preferences of patients, and the confidence in the estimates of effect sizes and preferences. GRADE provides a systematic and transparent method for assessing the strength of recommendations.

Most publications in laboratory medicine focus on relatively simple outcomes related to analytical effectiveness and clinical performance.[17] However, laboratory guidelines would benefit from familiarity with the GRADE process, to focus evidence on important outcomes. Because they are focused on outcomes, evidence-based recommendations provide a foundation on which laboratories can deliver value.