Five-week Outcomes From a Dosing Trial of Therapeutic Massage for Chronic Neck Pain

Karen J. Sherman, PhD, MPH; Andrea J. Cook, PhD; Robert D. Wellman, MS; Rene J. Hawkes, BS; Janet R. Kahn, PhD; Richard A. Deyo, MD, MPH; Daniel C. Cherkin, PhD


Ann Fam Med. 2014;12(2):112-120. 

In This Article



We conducted a 6-arm trial with 5 dosing schedules of massage. The trial protocol and all study procedures were approved by the Group Health Research Institute institutional review board. Before being screened for eligibility by telephone, prospective participants gave oral consent. Those still eligible gave written consent before an in-person examination and study enrollment. The study protocol, which has been published in detail,[17] is summarized below.


Study participants were recruited from Group Health, an integrated health care system serving about 500,000 persons, and from the general population of greater Seattle. Adults aged 20 to 64 years with chronic nonspecific neck pain lasting at least 3 months who were able and willing to attend treatments at our clinic and give informed consent were potentially eligible. From June 2010 through August 2011, we recruited prospective participants using mailed invitations to Group Health members with neck pain–related visits to primary care clinicians, advertisements in the health plan's magazine, posters, a study website, neighborhood blogs, and direct-mail postcards.

We excluded individuals whose neck pain had a pathologically identifiable cause (eg, vertebral fracture, metastatic cancer), was complex (eg, cervical radiculopathy, recent automobile accident), or was too mild, defined as scoring less than 4 on a pain intensity scale ranging from 0 to 10 and less than 5 on the Neck Disability Index (NDI) ranging from 0 to 50. We also excluded those with potential contraindications for massage (eg, hypersensitivity to touch), any massage within the last 3 months, massage for neck pain within the last year, or an inability to give informed consent or speak English. Finally, we excluded persons with medicolegal issues related to neck or back pain.


At the end of the baseline interview, a research assistant electronically randomized each participant to 1 of the 6 treatment groups. Treatment assignments were generated by a statistician (A.J.C.) using the freely available R software (version 2.11.0, R-Project for Statistical Computing), with random block sizes of 6 and 12 within 2 strata, based on NDI scores (5–14 and ≥15). They were embedded in the computer-assisted telephone interviewing program and inaccessible to study staff before randomization.


For the 4-week primary treatment period, participants were randomized to a wait list control group or to 5 different dosing schedules of massage: 30-minute treatments either 2 or 3 times per week, or 60-minute treatments 1, 2, or 3 times per week. We defined adherence as completion of at least 75% of the visits in each protocol.

On the basis of an earlier study,[18] we defined distinct treatment protocols for both 30- and 60-minute treatments, which included range of motion assessment, hands-on check-in, massage applied directly to the neck, addressing compensatory patterns, and integration (reestablishment within a patient of being in a unified body after having received intensive isolated work). Therapists were given time limits for each part of the massage and permitted to use a broad range of massage techniques. No self-care recommendations were permitted. Eight licensed massage therapists with at least 5 years of experience were trained in the study protocol and provided massage treatments in the research clinic at Group Health. Treatment fidelity was monitored by a research assistant who was also a massage therapist and who observed a treatment for all therapists and 34% of those randomized to massage (4% of all treatments).

Outcomes and Follow-up

Outcomes were assessed at baseline and again at 5 weeks (a week after treatment completion) by telephone interviewers who were unaware of treatment assignment. Our prespecified primary outcomes were clinically important improvements in neck pain–related dysfunction and pain intensity. We attempted to obtain follow-up data from all trial participants.

The 10-item, 51-point NDI was used to measure neck pain–related dysfunction; higher scores indicate greater disability. The index shows high internal consistency and test-retest reliability, is responsive to change, and correlates well with the McGill Pain Questionnaire.[19,20] The 11-point numerical rating scale was used to measure neck pain intensity; higher scores indicate more intense pain. This scale has demonstrated sensitivity to change and is correlated with other measures of pain intensity.[21] Secondary outcomes included mean NDI and neck pain intensity; 3 types of activity limitation;[22] perceived stress, measured by the 10-item Perceived Stress Scale (higher scores indicate greater stress);[22] a single-item, 7-point patient global rating of improvement (higher scores indicate less improvement); and a single question about overall patient satisfaction.[23]

Sample Size and Power

Details of our sample size calculations and all assumptions have been provided previously[17] but are summarized briefly. Because this was a 6-arm dosing study, the calculation of sample size was inherently more complicated. Our sample size was chosen to ensure adequate power to detect a significant difference between at least 2 of the 5 massage treatment groups (and not just adequate power to find a difference between 1 or more of the treatment groups and the control group). We powered our study for the primary binary outcome of a clinically meaningful improvement in neck-related dysfunction (≥5 points on NDI). With 34 participants per group, we have 97% power to find a significant difference between at least 2 of the 6 groups (assuming that the control group had a 7% improvement and the massage groups had an improvement of 35%–70%) and 80% power to find a significant difference between 2 active massage groups. Assuming 10% loss to follow-up, we recruited 38 participants per group, for a total sample size of 228 in the trial.

Statistical Analysis

We calculated summary statistics (frequencies, means, and standard deviations) for baseline study participant characteristics by treatment group to identify any important baseline differences across groups. Following the a priori primary analysis plan, differences across treatment groups in the primary outcomes, a clinically meaningful improvement in neck-related dysfunction (≥5 points on NDI)[24] or in pain (≥30% reduction on neck pain intensity scale)[25] measured at 5 weeks after randomization, were evaluated using modified Poisson regression fitting a Poisson log-link regression model with generalized estimating equations (GEE) and robust standard errors.[26] To avoid the pitfall of multiple comparisons related to having 6 treatment groups, we used the Fisher protected least-significant difference approach.[27] This approach makes pairwise comparisons among the 6 treatment groups only if the overall omnibus Wald test statistic is significant. Prespecified secondary analyses using linear regression models with GEE and robust standard errors were used to estimate differences in mean changes from baseline across treatment groups for the 5-week NDI and neck pain intensity outcomes. All adjusted models included baseline NDI and neck pain intensity, age, sex, neck pain longer than 5 years in duration, use of medications for neck pain, and race (white non-Hispanic vs other). All adjusted variables were prespecified except for race, which was shown at baseline to have larger than expected differences across groups and met the adjustment criteria of not being related to any other prespecified adjustment variable and may be predictive of outcome response, drop-out, or both.

We used similar adjusted models to analyze the secondary outcomes. For the binary outcomes—more than 7 days in the past week that normal activities were cut by at least one-half due to neck pain, at least 1 day in the past 4 weeks that neck pain kept you in bed or lying down for most of the day, and at least 1 day in the past 4 weeks neck pain kept you out of work or school—we adjusted for only baseline NDI and neck pain intensity because of model-fitting issues for these uncommon outcomes. Further, for the secondary continuous outcome, perceived stress scale, we also adjusted for baseline perceived stress scale response.

All analyses were conducted according to intention to treat (ie, comparing participants in the groups to which they were originally randomly assigned). Analyses were performed using SAS statistical software (version 9.2; SAS Institute Inc). All P values are 2 sided and Wald based, with statistical significance at the P = .05 level.