Study Protocol: The Back Pain Outcomes Using Longitudinal Data (BOLD) Registry

Jeffrey G Jarvik; Bryan A Comstock; Brian W Bresnahan; Srdjan S Nedeljkovic; David R Nerenz; Zoya Bauer; Andrew L Avins; Kathryn James; Judith A Turner; Patrick Heagerty; Larry Kessler; Janna L Friedly; Sean D Sullivan; Richard A Deyo


BMC Musculoskelet Disord. 2012;13(64) 

In This Article


BOLD Registry

Overview The overall goal of this project is to establish a sustainable and rich registry to evaluate prospectively the effectiveness, safety, and cost-effectiveness of diagnostic approaches and interventions for elderly patients with back pain. The registry can also be used to identify and recruit patients for additional studies. We plan to recruit 5,000 patients age 65 and older with new episodes of health care visits for back pain (defined as no prior visits to a health care provider for back pain care within 6 months). Patients who enroll in the registry complete validated, standardized measures of pain, back pain-related disability, and health-related quality of life at enrollment and 3, 6 and 12 months later. Our project includes a demonstration comparative effectiveness study of early (<six weeks after initial medical visit) imaging versus no early imaging for elderly patients with back pain. In this observational cohort study we will test the hypothesis that early imaging is associated with more interventions and adverse labeling (where simply assigning a diagnostic label results in worse health related quality of life), greater disability and higher levels of pain compared to matched controls who do not undergo early imaging. We will also test the hypothesis that racial and ethnic minorities will have lower rates of early imaging than non-minorities. In parallel with construction of the BOLD registry, we also are performing a double-blind, randomized controlled trial of epidural steroid with local anesthetic compared with a local anesthetic injection alone for spinal stenosis; this component of the study is described elsewhere.[7]

Participating Centers BOLD is recruiting patients at three integrated health care systems: Kaiser Permanente of Northern California (KPNC), Henry Ford Health System (HFHS), and Harvard Vanguard Medical Associates/Harvard Pilgrim Health Care (HVMA/HPHC). We chose these sites for their geographic and demographic diversity. Confining our registry to the integrated health systems with comprehensive electronic medical record systems allows us to take advantage of the well-defined populations that their patients comprise as well as the wealth of data available in these systems, including health care utilization.

The University of Washington's Comparative Effectiveness, Cost and Outcomes Research Center (CECORC) and Center for Biomedical Statistics (CBS) serve as the Data Coordinating Center (DCC) for BOLD. A collaborator at Oregon Health and Sciences University (RAD) is also part of the DCC.

Institutional Review Board (IRB) Approval The IRBs at all participating institutions (University of Washington, Harvard Vanguard, Harvard Pilgrim, Henry Ford Health System, and Kaiser-Permanente Northern California) reviewed and approved the protocols for the BOLD Registry and the Observational Cohort Study of Early Imaging.

Subject Eligibility We identify patients at their health care visits for back pain using the Ninth International Classification of Diseases (ICD-9) codes.[8] We recruit subjects from both primary care clinics and urgent care/emergency care settings. Since our aim is to evaluate treatment effectiveness (how an intervention performs in the real world) rather than efficacy (how an intervention performs under ideal conditions), our inclusion criteria are as broad as possible. Our inclusion and exclusion criteria are listed in Table 1.

Patient Identification We screen for study eligibility patients ≥ 65 years old who have had a primary care visit (including by telephone) or urgent/emergency care visit and been assigned a diagnosis code indicating back pain within the past three weeks (Table 2). In addition to patient encounters with physicians, we also include patients who have had encounters with non-physician primary care providers (registered nurses, nurse practitioners and physician assistants). We select for patients with relatively new onset episodes of back pain by excluding those with visits for back pain in the previous 6 months.

Patient Enrollment The exact method of the initial subject contact varies slightly at each site. The site research staff identify and contact potential subjects by telephone, email, mail or in person, describing the study and inviting them to participate using a standardized script. In the invitation we provide a web address ( that has additional information about the study.

If the patient agrees, the research staff determines eligibility, verifying inclusion/exclusion criteria that were assessed during the query of the electronic health information system (Table 3). Patients provide verbal assent for their participation in the registry, which includes access to their medical records.

We offer subjects a $10 gift card or check for each completed interview (baseline and 3, 6, and 12 months later). The total time for completing the study questionnaires at each assessment is approximately 15–30 minutes.

Data Collection Our data come primarily from two sources: subject questionnaires and electronic data records. We have attempted to minimize the questionnaire burden while still obtaining important information regarding the patient's back pain. At baseline, trained research coordinators/interviewers administer the questionnaires either in person or over the telephone within three weeks of a subject's index primary care visit.

Follow-up We contact each registry patient at three, six and 12 months after baseline to collect data on patient treatments and outcomes. For follow-ups, the questionnaires are either self-administered by the subject using a mailed hard copy or administered by a research coordinator over the telephone. We plan to develop an on-line version of the questionnaire as an option for subjects to complete after being sent a link by email.

Follow-up questionnaires can be completed within a two-week window on either side of the follow-up time-point. We use a computerized tracking system to identify when patients enter the interview window and when interviews are complete. If patients withdraw from the study, we attempt to identify the reason.

Baseline and Follow-up Measures We collect demographic information and information regarding back pain duration and back pain recovery expectations at baseline. We also administer the following measures at each assessment: 1) Roland-Morris Disability Questionnaire (RMDQ),[9] modified slightly to indicate disability due to back or leg pain (sciatica); 2) 0–10 numerical rating scales (NRS) of average back and leg pain in past 7 days; 3) Brief Pain Inventory activity interference scale;[10,11] 4) Patient Health Questionnaire (PHQ)-4 Depression and Anxiety screen;[12] 5) the EuroQol-5D (EQ5D)[13] 6) Behavioral Risk Factor Surveillance System (BRFSS) survey (2 questions about falls).[14] We repeat the same measures at each follow-up time-point except for the duration of pain and patient recovery expectation questions.

Baseline Descriptive Measures: Pain Duration: We ask subjects at baseline to categorize the length of the current episode of back or leg pain (sciatica) as follows: 1) less than 1 month; 2) 1–3 months, 3) 3–6 months; 4) 6–12 months; 5) 1–5 years; and 6) more than 5 years.

Patient Expectations: We ask subjects to use a 0–10 NRS to rate their confidence that their pain will be completely gone or much better in 3 months.

Primary Outcome Measure: Roland-Morris Disability Questionnaire: Our primary outcome measure is the Roland-Morris Disability Questionnaire (RMDQ),[9] a back pain-specific functional status questionnaire adapted from the generic Sickness Impact Profile (SIP).[15] The original version consists of 24 yes/no items, which represent common dysfunctions in daily activities experienced by patients with back pain.[9] We use a slightly modified version of the questionnaire in which we add "or leg (sciatica)" to the words "back pain" where appropriate. A single score is derived by summing the items endorsed by the respondent, with higher scores indicating worse function. Both the original and modified RMDQ have proven to be more responsive to change over time than most subscales of the SF-36[16] or disability day questions from national health surveys.[16] Its internal consistency is excellent.[17] Its construct validity is supported by significant associations in the expected directions with symptom severity, neurologic deficits, opioid medication use, work absenteeism, and other measures of health status (subscales of the SF-36, disability days).[18,19] The RMDQ was the measure most responsive to clinical changes over time in the Maine Low Back Pain Cohort study.[16]

Additional Patient-reported Measures: Pain Numerical Rating Scale (NRS): We ask subjects to rate separately their average back and leg pain within the past seven days on 0–10 scales, with 0 = no pain and 10 = worst pain imaginable. Investigators commonly use NRS's of pain intensity as outcomes in clinical trials of pain therapies, and these ratings have been demonstrated to be valid, reliable, and sensitive to detecting change in pain intensity after treatment.[20] The Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) recommended a 0–10 NRS measure of pain intensity as a core outcome measure in pain clinical trials and noted that NRS measures had advantages over visual analogue scale (VAS) measures, including ability to be administered by telephone, preference by patients, and less missing and incomplete data.[21] Further, older adults may have difficulty completing VAS measures.[20] The IMMPACT group also recommended that clinical trials report the percentage of patients obtaining reductions in pain intensity from baseline of at least 30% on the NRS, and suggested that investigators may also wish to report the percentages of patients obtaining reductions in pain intensity of at least 50%. We plan to use both of these indicators of clinically meaningful change.

Pain Interference The validated Brief Pain Inventory (BPI) Interference scale measures pain interference with activities.[11] The scale consists of 7 ratings (0–10) of how much back pain interferes with the following: general activity, mood, ability to walk, normal work, relations with other people, sleep and enjoyment of life.

Patient Health Questionnaire-4 Depression and Anxiety Screen The PHQ-4 is a four-item screen for depression and anxiety that has good sensitivity and specificity for identifying depression and anxiety disorders.[22]

EQ-5D The EQ-5D is a standardized health outcome instrument consisting of five dimensions (mobility, self-care,usual activities, pain/discomfort, and anxiety/depression). In addition, the instrument includes a "feeling thermometer" to assess respondents' current health-related quality of life (0–100). The EQ-5D has been extensively validated and studied for a wide variety of conditions and populations, including the elderly, and is used as a utility measure in cost-effectiveness analyses ([23]

Behavioral Risk Factor Surveillance System (BRFSS) Falls The BRFSS Falls screen is a two-item questionnaire that assesses the number of falls the respondent has had in the past 3 weeks and how many resulted in injury.[24–29]

Additional Data In addition to the patient-reported outcome measures, we will use electronic medical record and administrative data that are available at these integrated health systems. These health systems have standardized administrative and clinical data collected across their systems (Figure 1). We will generate queries of each health system's information system to acquire demographic, pharmacy, laboratory, vital sign, and provider data. Table 4 contains a list of variables that we plan provisionally to obtain from each site for each subject.

Figure 1.

Virtual Data Warehouse data elements.

Data Management For all interviews, research coordinators enter data on specially formatted, paper data collection forms that are stored securely at each site. In addition, sites have the option of entering data directly into the on-line REDCap data system.[30] This has the added advantage of automated range and logic checks that reduce data entry errors. The study has two classes of data: 1) data containing protected health information (PHI) that is only stored locally at each site on a secure server; and 2) a limited data set with dates of service but no other PHI that is uploaded to a central database at the DCC using a web interface.

Research assistants check data from the interviews for missing or unclear responses while the subject is still available. The data coordinating center's senior programmer directs the data management and re-checks the data for quality. We defined specific logic rules for establishing the internal consistency of responses across several variables. When necessary, we check the original data collection forms or re-contact the subject.

Analytic Approach In many registries, there may be a relatively small subgroup of interest (e.g., a treatment or diagnostic testing prevalence of 1 or 2%) and a large number of controls available for a comparative analysis of outcomes. As a general analytic approach in BOLD, we will evaluate cases in comparative studies using 3:1 matched controls with the nearest propensity score.[31]

Propensity-based matching is a strategy for assembling similar groups of patients in the absence of randomization. We will use this method to select control patients who are similar to patients who have selectively received the intervention of interest (e.g. early imaging). If we do not appropriately match controls on important baseline characteristics to patients who receive the intervention, there is a risk of obtaining biased results when comparing cases with controls due to confounding.[31–37] Propensity score matching aims to provide a valid estimate of the intervention effect by comparing patients who have and have not had the intervention and who also have similar observed characteristics.

For a given research question and set of patients, we will use logistic regression models (or multi-level logistic regression models for ordinal or multilevel treatments; e.g., levels of treatment dose) to generate a propensity score for each person, using variables that are significant predictors of the intervention of interest. We will then match the propensity score of each case who received the intervention to the nearest propensity scores (up to 3) available among control patients whose propensity scores lie within a caliper window of 0.2σ of the case index, where σ is a measure of variation in the propensity score distributions of cases and controls as given in Rosenbaum and Rubin.[35] If no control propensity scores fall within the caliper window of a particular index case, then the index case will be excluded from any analyses.

Observational Cohort of Early Imaging

We will conduct an observational cohort study of early imaging in seniors with new visits for back pain as our first comparative study using data from the BOLD registry. Our goal is to test the hypothesis that imaging of the lumbar spine within 6 weeks of the index visit (early imaging) is associated with worse patient outcomes and increased health care utilization and costs. Patients who get early imaging may be those with the worst pain or most alarming clinical presentation. However, given the variability in clinician ordering patterns, there is also a reasonable likelihood that those patients who do and do not get early imaging have considerable overlap.

Prior work has suggested an association between early imaging and subsequent interventions[38,39] but lacked the statistical power to detect a significant association.

Subject Eligibility All subjects enrolled in the registry will be eligible for the observational study of early imaging. Cases selected for the observational cohort will be registry patients who had early imaging of the lumbar spine. We will identify propensity score-matched controls from the registry (see below) who did not have early imaging of their spine.

Analytic Approach to the Observational Cohort of Early Imaging Our overall aim for the observational cohort study is to compare the pain, function, and resource utilization and associated costs of patients who have early (within six weeks of index medical visit) imaging (radiographs, magnetic resonance imaging (MRI), computed tomography (CT) and bone scans) to those who do not have early imaging. The sample will consist of registry patients with new episodes of back pain. Our primary hypothesis is that patients who undergo early imaging will have worse modified RMDQ scores at one year compared with those who do not receive early imaging, after controlling for baseline back pain-related disability, pain severity and pain duration. Our rationale is that imaging may lead to adverse labeling[40] or more interventions (injections, surgery),[39] with resultant complications. We will also test the hypothesis that early-imaged subjects undergo more invasive and more resource-intensive subsequent interventions than those who do not.

Matching We will construct a propensity score based upon the logit function of the probability of receiving early imaging (e.g., the log odds) for a patient with specific characteristics or prognostic factors.[37] We will use fixed matching of age (5-year strata), sex (male/female), and race (Caucasian/African American/other) in the generation of the propensity score and include candidate baseline covariates such as other co-morbidities or diagnoses identified at baseline, modified RMDQ score, and pain intensity rating. Patients receiving early imaging will be matched to the closest control whose propensity score differed by less than 0.2σ among those patients within five years of age.

Primary Analysis Our primary outcome measure is back-specific disability measured by the RMDQ at 12 months. We have selected the 12-month assessment as the primary outcome because this allows adequate time for any intervention benefit to manifest, and is the final assessment opportunity for the initial registry study design.

We will first assess comparability of baseline characteristics between the matched groups to gauge the effectiveness of the propensity matching and then address any residual covariate imbalances through model adjustment. Rosenbaum and Rubin suggested that an approach combining both the propensity score and covariate adjustment is superior to the use of either strategy alone.[41]

Using the propensity-matched pairs, we plan to use a paired t-test to compare the between-group 12-month change in RMDQ. In conjunction with this primary analysis, we will use multivariate linear regression models adjusting for the propensity score and baseline factors that appear to have residual imbalance in order to compare groups with and without early imaging.

We will use multivariate linear regression models adjusting for the propensity score or conditional logistic models to identify predictors of patient outcome at the one-year follow-up. We will use interaction terms between the early imaging and baseline characteristics to identify variables that predict differences in the outcome associations between the two groups.

We will include subjects who have subsequent imaging more than six weeks after entry to the study in the non-early imaging group. We will compare characteristics of subjects who receive later imaging to those who do not in a sensitivity analysis.

Secondary Analyses We will conduct similar analyses for the RMDQ at three and six months as well as for the pain NRS and EQ-5D using all data through one year. We will use methods appropriate for the analysis of repeated measures such as linear mixed models or repeated measures ANCOVA,[42] adjusting for the propensity score. We will assess binary secondary outcomes such as achievement of a 30% reduction in pain using conditional logistic regression models.

Using the patient‐reported data and the electronic health system information systems, we will enumerate the number and type of invasive interventions that patients undergo following enrollment. These interventions are listed in Table 5. We will use fixed effects conditional Poisson regression models to compare adjusted spinal surgery rates between those patients who did or did not receive early imaging, conditional on matched pair.[43] In addition, we will examine the time to first invasive intervention using survival analysis with a Cox proportional hazards model and adjust for the propensity score for early imaging.

Another hypothesis is that racial and ethnic minorities will have lower rates of early imaging than non-minorities. To test this hypothesis, we will use the registry to compare rates of early imaging between African Americans/Blacks and Whites as well as between Hispanics and Whites. We will test for differences in rates using fixed effects conditional Poisson regression models, controlling for the propensity score and residual imbalances among important covariates. We will also examine subsequent invasive interventions as well as outcomes in each of these ethnic and racial subgroups. If early imaging rates are indeed lower in racial and ethnic minorities, we would expect subsequent invasive interventions to be fewer and functional status better.

Economic Analysis The primary economic hypothesis is that patients receiving early imaging will have higher health care utilization, higher costs, and worse outcomes at one year compared to those not receiving early imaging. The primary economic outcome will be one-year incremental cost per quality-adjusted life year (QALY) gained from the private/public payer perspective.[44]

The cost-effectiveness assessment will use the health systems' electronic medical records and administrative data as well as patient-reported outcome data. We will use the electronic data to assess within-health system categories of resource utilization (e.g., office visits, procedures, surgeries, tests, medications). We will use the Marketscan® data warehouse ( to obtain an estimate of 2012 private payer average unit costs for medications and medical procedures/services.

We will report short-term costs and consequences (baseline to 3 months) and assess six-month and one-year outcomes incorporating the linear mixed-model approach used in the primary outcomes analysis. Sensitivity and specific scenario analyses will be undertaken to evaluate uncertainty on cost-effectiveness parameters.[45]

Sample Size Prior studies suggest that approximately 15%–30% of back pain patients will have early imaging of the lumbar spine.[46] Given a registry size of 5,000 subjects, we expect 750–1,500 patients in the BOLD registry will have early imaging and comprise cases for the observational matched cohort study.

In a matched study, missing data at follow-up in either the case or matched control imply that neither patient's data will be included in a matched analysis. That is, if we anticipate between 10–15% loss to follow-up equally balanced between comparison groups, the number of missing data points can be as much as doubled in any matched or conditional analysis. To compensate for this, we will enrich the control sample with 3:1 matched sampling so that each case will have up to three controls followed in an identical manner. In Table 6, we see that this number of patients offers adequate power to detect minimally clinically relevant differences in functional and pain outcomes, as well as important differences in rates of surgery, complications, or adverse events. Given that one of our enrolling sites (Kaiser Permanente Northern California) is much larger than the other sites, we anticipate approximately triple the number of subjects to be enrolled from KPNC than the other two sites, or 3,000 vs. 1,000 subjects.

An important advantage of a registry is the ability to detect relatively rare events due to the large sample size. We base our sample size estimates on the ability to detect and make inference on relatively rare events. In the primary care setting, examples of rare events would be subsequent surgery or adverse outcomes from interventions such as epidural steroid injections. While our first planned use of the registry is for the comparative effectiveness evaluation of early imaging vs. no-early imaging in the elderly, we envision other evaluations such as the comparative effectiveness of physical therapy vs. no physical therapy.

Data Access As the registry progresses in size and maturity, we anticipate making the BOLD resources available to researchers interested in evaluating diagnostic tests, treatments, and outcomes among elderly patients with back pain. Detailed information regarding data sharing will be available at