Effectiveness of Placebo Interventions for Patients With Nonspecific Low Back Pain

A Systematic Review and Meta-analysis

Rob H.W. Strijkers; Marco Schreijenberg; Heike Gerger; Bart W. Koes; Alessandro Chiarotto


Pain. 2021;162(12):2792-2804. 

In This Article


The study protocol for this systematic review was written a priori and was registered on February 4, 2019 (PROSPERO Registration number CRD42019127465). The guidance of the Cochrane Back and Neck Review Group was followed for the methods.[19] The PRISMA statement was adopted for the reporting of this systematic review.[44]

Databases and Search Strategy

Embase, MEDLINE (Ovid), and Cochrane CENTRAL databases were searched from inception up to December 5, 2019. The search strategies for this systematic review were based on the strategy used in Hróbjartsson and Gøtzsche's 2010 Cochrane review,[33] with "low back pain" as an additional search key term. The new search strategy was developed in collaboration with an experienced information specialist. Elements of the search were "back pain," "placebo," and "randomized controlled trial" and corresponding synonyms (Appendix A, supplemental digital content available at https://links.lww.com/PAIN/B345). No language restrictions were adopted for the search. Furthermore, the references of all included studies of the most recent update of the Cochrane review by Hróbjartsson and Gøtzsche were searched manually for additional studies.[33]

Eligibility Criteria

Inclusion criteria were as follows:

  1. randomized controlled trials with a three-group design (active intervention group, placebo group, and no intervention groups) or at least a comparison of a placebo intervention vs no intervention (in a 2-group design);

  2. studies recruiting adult patients (>18 years old) with nonspecific LBP; in trials with a mixed population (eg, LBP and neck pain patients), at least 75% of participants had LBP for the trial to be eligible;

  3. the study has evaluated the effectiveness of a placebo intervention; and

  4. the study is available as a full-text article.

The authors have chosen to only include RCTs as they provide the least biased estimates on the effectiveness of interventions for clinical practice.[64] Following Hróbjartsson and Gøtzsche,[33] we defined a placebo intervention as any intervention labelled in the trial report as being a placebo or an analogous term, such as sham, fake, dummy, or nonspecific or unspecific treatment. Randomized controlled trials in which the placebo group and no treatment group received the same underlying treatment (eg, "usual care") were included. Studies focusing on nocebo treatments or effects were excluded.

Outcomes of Interest

Pain intensity, physical functioning, and health-related quality of life (HRQoL; ie, core outcome domains for LBP[8]) measured at short-term (up to 3 months), medium-term (4–12 months), and long-term (1 or 2 years) follow-up were the primary outcomes of this review. If an RCT did not include at least one of these outcomes, the article was excluded. In case of multiple short-term outcomes, the outcome closest to 1-month follow-up was used for the primary analysis.

Selection of Studies

Two reviewers (M.S. and A.C.) independently screened titles and abstracts of the articles found in the literature search to decide which articles to retrieve in full text. The same 2 reviewers read the full-text articles to include all relevant studies according to the selection criteria, as mentioned above. A consensus meeting was held to discuss studies about which there was disagreement between the reviewers. In case of disagreement, a third independent reviewer (B.W.K.) made the final decision whether or not to include the study.

Data Extraction

Data extraction on outcomes was performed in duplicate by 2 reviewers (R.H.W.S. and H.G.), whereas data extraction on characteristics of included studies and their patients was performed by one reviewer (R.H.W.S.) and double checked by a second one (M.S.). Standardized extraction forms were used (Appendix B, supplemental digital content available at https://links.lww.com/PAIN/B345). These characteristics included study design, sample size, baseline patients' characteristics (eg, age, sex, pain duration, pain intensity, physical functioning, chronicity [acute: <6 weeks, subacute: 6 < 12 weeks, and chronic >12 weeks]), interventions' characteristics, follow-up time, primary outcomes, results for outcomes at all follow-ups (mean, SD, and sample size), and funding source. We performed the meta-analysis in RevMan (Review Manager v5.3, The Nordic Cochrane Centre, Copenhagen, DK).

Whenever possible means and SDs were extracted directly from the article or derived from available values (medians, IQR, 95% confidence interval [CI], ranges, etc) using appropriate estimation formulas.[61] We used GetData Graph Digitizer version to retrieve data from article figures and graphs in case no written data were available. If articles reported change from baseline, the endpoint mean was calculated and SD of the baseline was used as suggested by the Cochrane Handbook v6, chapter In case of multiple placebo groups (eg, enhanced and nonenhanced placebo interventions), the results of the placebo groups were combined by taking the weighted mean and SDs of the 2 groups.[12]

Risk of Bias Assessment

Two reviewers (R.H.W.S. and M.S.) independently scored the risk of bias (RoB) of included studies using the revised Cochrane risk-of-bias tool for randomized trials, version of August 22, 2019.[28] This RoB tool consists of 23 items in 5 subdomains that can be answered with "yes," "probably yes," "probably no," "no," or "no information." This results in a trial overall RoB judgement that may be "low," "high," or "some concerns." A consensus meeting was held to discuss studies about which there was disagreement between the reviewers. In case a consensus was not reached, a third independent reviewer (A.C.) made the final decision concerning the risk of bias assessment.

Meta-analysis and Interpretation of Results

Between-group MDs were calculated for continuous or ordinal outcomes measured with the same instruments; SMDs were estimated if different instruments were used to assess the same outcome (eg, Roland–Morris Disability Questionnaire and Oswestry Disability Index to measure physical functioning). Statistical pooling was performed if there was clinical (sufficiently homogeneous study population) and methodological homogeneity (comparison, outcome, and assessment time points) across trials. Random-effects models rather than a fixed-effects model was used on the assumption that the included studies differed to some extent with respect to clinical and other factors.

To assess the presence of between-study heterogeneity of effect sizes, τ2 was calculated to get an estimate of the variance between the effect sizes from individual studies.[26] For the primary outcome a value of τ2 = 0.04 was considered as low heterogeneity, 0.09 as moderate, and 0.16 as high heterogeneity.[3] In addition, the I2 statistic was used which can be interpreted as the percentage of heterogeneity attributable to between-study variation rather than a random error.[27] Meta-analyses were performed in Review Manager (RevMan) 5.3. If at least 10 studies were included in a meta-analysis, a funnel plot was created to explore whether there was asymmetry among the trial results (which may indicate publication bias). Separate analyses were conducted for patients with (sub)acute or chronic LBP, consistently with the method guideline for systematic reviews of the Cochrane Back and Neck Group.[19] Subgroup analyses were also considered for different placebo interventions (eg, open-label placebo, placebo acupuncture, and placebo tape). A sensitivity analysis was conducted, excluding studies rated as having a high RoB. For all analyses, a 2-sided P < 0.05 was used to indicate statistical significance, ie, whether placebo interventions were more effective than no intervention. Based on recent studies estimating the smallest worthwhile effect for conservative interventions vs no intervention in patients with LBP,[9,17] a 20% between-group difference was a priori established as a clinically relevant effect.

Evidence Synthesis

The Grading of Recommendations Assessment, Development, and Evaluation[22] approach was subsequently used to rate the quality of evidence of pooled estimates into "high," "moderate," "low," or "very low" as suggested by the Cochrane Back and Neck Group.[19] For each specific outcome, evidence was downgraded one or more levels for each of the 5 Grading of Recommendations Assessment, Development, and Evaluation components that was met across all studies measuring that particular outcome: limitations, inconsistency, indirectness, imprecision, and publication bias.[19,22] The following rules were used for downgrading quality of evidence:

  1. Limitations: one level if 50% to 75% of included trials scored "low or some concerns" on RoB and 2 levels if less than 50% of included trials scored "low or some concerns" on RoB.

  2. Inconsistency: one level if the value for the I2 statistic was between 40% and 75%.

  3. Indirectness: one level if a study did not specifically mention the target population (adult patients with nonspecific LBP) or if it was unclear whether the intervention could be indeed considered a placebo and whether the control was indeed a no intervention with respect to the placebo.

  4. Imprecision: one level if the total sample for a specific outcome was <100 patients and 2 levels if the total sample for a specific outcome was <50 patients.

  5. Publication bias: one level if the funnel plot indicates publication bias may be present.