Effectiveness of Acupuncture for Low Back Pain: A Systematic Review

Jing Yuan, PhD; Nithima Purepong, MSc; Daniel Paul Kerr, PhD; Jongbae Park, KMD, PhD; Ian Bradbury, PhD; Suzanne McDonough, PhD


Spine. 2008;33(23):E887-E900. 

In This Article

Materials and Methods

Study Identification

RCTs in English were searched in Medline (1966-2008), Pubmed (1950-2008), EMBASE (1974-2008), AMED (1985-2008), ProQuest (1986-2008), CINAHL (1982-2008), ISI Web of Science (1981-2008), and Cochrane Controlled Trials Register (1980-2008). Medical Subject Heading (MeSH) words including acupuncture/electroacupuncture and low back pain/back pain/lumbar vertebrae/lumbosacral region/sprains and strain and randomized controlled trials/controlled clinical trials were used. References in relevant reviews and RCTs, and 4 key journals, Complementary Therapies in Medicine (2000-2007), Spine (1996-2008), Anesthesia (1998-2008), and Clinical Acupuncture and Oriental Medicine (1999-2007), were manually searched.

Study Selection

Two reviewers independently identified potentially eligible trials. Studies included were RCTs of all types of acupuncture with adequate treatment, compared with different types of control interventions for adults (≥18 years) with nonspecific LBP, using at least 1 of the following outcome measures that are considered to be the most important for LBP (pain, functional disability, general health status, physiologic outcomes, a global measure of improvement, return to work) and published in English. RCTs comparing different forms of acupuncture or on specific LBP conditions (e.g., pregnancy) were excluded. Nonspecific LBP was defined as pain below the 12th costal margin and above the inferior gluteal folds, with or without radiating leg pain, for which specific etiologies such as infection, tumor, osteoporosis, fracture, structural deformity, inflammatory disorder, radicular syndrome or cauda equina syndrome, and other relevant pathologic entities had been excluded.[22]

Treatment Comparisons

The included studies were grouped according to the control groups, i.e., no treatment, sham interventions, conventional therapy, acupuncture or sham acupuncture in addition to conventional therapy.

Assessment of Acupuncture Treatment Adequacy

Data on intervention details were extracted according to the Standards for Reporting Interventions in Controlled Trials of Acupuncture (STRICTA) guidelines.[23] The adequacy of acupuncture treatment was judged by comparing the parameters in RCTs to those from textbooks, surveys, and review sources. Trials with inadequate treatment procedures were excluded from this review.

Assessment of Methodologic Quality

Data were extracted and independently scored by 2 reviewers using the Van Tulder scale,[24] which has been adopted by the European guidelines for LBP[22] to assess the methodologic quality of trials. If there was any disagreement, a third reviewer would be consulted to come to a consensus. In this review, a high-quality study should score 6 or more on the Van Tulder scale, carry out a between-group statistical comparison, have at least 40 patients per group (to enable adequate power),[25] have a dropout rate less than 20% for short-term (<3 months) and intermediate term (≥3 months and <1 year) follow-up, and 30% for long-term (≥1 year) follow-up.[24,26,27] Although dropout rates have been included in Van Tulder scale, in this review, they were considered independently for each study because of their significant impact on the study results. More weight was given to high quality studies, when conducting the best-evidence synthesis on the effectiveness of acupuncture for nonspecific LBP.

Data Analysis

Best Evidence Synthesis. Best evidence synthesis was performed by attributing various levels of evidence to the effectiveness of acupuncture for nonspecific LBP, based on the methodologic quality and the results of the original RCTs[24,26]:

Level 1: strong evidence-consistent findings among multiple high-quality RCTs (when >75% of the RCTs report the same findings).

Level 2: moderate evidence-consistent findings among multiple low-quality RCTs and/or 1 high-quality RCT.

Level 3: limited evidence-1 low-quality RCT.

Level 4: conflicting evidence-inconsistent findings among multiple RCTs.

Level 5: no evidence: no RCTs.

The results of the original RCTs were based on the between-group statistical significant difference (P < 0.05), or on the author's conclusions when P-values were not available, for 2 primary outcomes, pain and functional disability.

Effect Size. Review Manager 4.2.7 was used for statistical analysis. Means and standard deviations (SD) for pain and functional disability were extracted, and if possible, the treatment effect size of each RCT was plotted as point estimates i.e., standardized mean difference (SMD) for continuous outcomes and odds ratio (OR) for dichotomous outcomes in a random-effect model, each with corresponding 95% confidence intervals (95% CI) and 2-tailed P-values. The formula is shown below:

SMD = (Mean in the acupuncture group - Mean in the control group)/Pooled SD of both groups

OR = The ratio of successes to failures in the acupuncture group/The ratio of successes to failures in the control group

The effect size was defined as 0.20 for small, 0.50 for medium, and 0.80 for large effects.[28] For cross-over trials, the summary data were used as if they had been derived from parallel trials. In this review, the effect sizes were grouped according to the control interventions and follow-up time point.

Clinical Significance. In order to identify whether the changes observed with acupuncture were clinically significant compared to other forms of treatment, mean differences in pain and functional disability were calculated (acupuncture mean change over time minus control mean change over time), which were then compared to a minimal clinically important difference (MCID). MCID was defined as the cut-off point that best discriminated between improvement and nonimprovement in clinical practice for individuals. Considering the overall effect of acupuncture (specific and nonspecific), the MCID in this review was set at 2 points (0-10 scale) or 20 points (0-100 scale) for pain reduction (i.e., -20% of the total score).[29,30,31,32] The MCID for functional disability was also set, e.g., 30% reduction of score from baseline on Roland-Morris Disability Questionnaire (RMDQ) (24 items).[33,34] Clinical significance was deemed to be clearly achieved when both limits of 95% CI of mean difference was greater than the MCID.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.