Advanced Practice Nurse Outcomes 1990–2008

A Systematic Review

Robin P. Newhouse, PhD, RN, NEA-BC; Julie Stanik-Hutt, PhD, ACNP, CCNS, FAAN; Kathleen M. White, PhD, RN, NEA-BC, FAAN; Meg Johantgen, PhD, RN; Eric B. Bass, MD, MPH; George Zangaro, PhD, RN; Renee F. Wilson, MS; Lily Fountain, MS, CNM, RN; Donald M. Steinwachs, PhD; Lou Heindel, DNP, CRNA; Jonathan P. Weiner, PhD


Nurs Econ. 2011;29(5):230-250. 

In This Article


Design. A systematic review was conducted following processes specified for Evidence Based Practice Centers funded by the Agency for Healthcare Research and Quality, and guided by an expert co-investigator. Processes were designed to identify and select relevant studies; review, rate, and grade the individual studies; and synthesize the results for outcomes with a sufficient number of studies. Teams were developed for each of the APRN groups, led by a co-investigator. Five Technical Expert Panels (TEPs) were convened: one for each of the APRN groups and one methods panel to review the report of the overall project.

Search methods. The following databases were searched systematically: PubMed, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Proquest. For each APRN group, specific search strategies were developed with the assistance of a medical librarian and four APRN role-specific TEPs. The search strategy was intentionally broad to improve search sensitivity.

Inclusion criteria were randomized controlled trial (RCT) or observational study of at least two groups of providers (e.g., APRN working alone or in a team compared to other individual providers working alone or in teams without an APRN), conducted in the United States between 1990 and 2008, and reported quantitative data on patient outcomes. Studies prior to 1990 were not included since practice and interventions have changed both in the scientific basis and the organization of health care providers. Studies were excluded if they were non-English, included no quantitative data, or contained only outcomes that could not be affected by APRNs. For example, if the intervention included free medications for one group only, the outcomes could not be attributed to the care of the APRN alone. Only U.S. studies were included because: (a) the education for and implementation of advanced practice roles and scope of practice are different in the United States compared to other countries; and (b) the health care system in the United Sates (including health care access, health insurance, and costs of care) is very different from health care systems in other countries.

Search outcome. Figure 1 depicts the summary of the literature search results and article inclusion and exclusion at each level. A multi-step process was used to conduct the review, proceeding from titles to abstracts and then the full articles. At each step, the citation was reviewed and, if judged to not meet inclusion criteria, the reasons for exclusion were documented. Web-based database software facilitated access to studies and citation management. Standardized abstract forms included in the web-based software were developed by the team specifically for this project.

Figure 1.

Summary of Literature Search and Number of Articles
†* Reason for study exclusion can be attributable to more than one category.

Data abstraction. Titles, abstracts, and full articles were reviewed by two independent reviewers and included or excluded according to the criteria listed previously. A primary reviewer completed all of the relevant data abstraction forms. The second reviewer checked the first reviewer's data abstraction forms for completeness and accuracy. Reviewer pairs were formed to include personnel with both clinical and methodological expertise. The reviews were not blinded in terms of the articles' authors, institutions, or journal. As with article inclusion, differences of opinion that could not be resolved between the reviewers were resolved through consensus adjudication. If articles were deemed to meet inclusion criteria by both reviewers, they were included in the final data abstraction.

Quality assessment. Once a final set of studies were determined, the quality of each individual study was assessed using a modified scale informed by the Jadad scale (Jadad et al., 1996). Table 1 includes the quality assessment criteria. Since the Jadad scale was designed for RCTs (e.g., use of double-blinding), additional quality criteria were constructed to account for the observational studies represented in this review (e.g., similarity of groups and settings, group sample sizes, sources of bias). The additional quality criteria included comparability of participants and settings, sample size, reliability and validity of measures, bias control, and attribution of outcome to APRN. Attribution of the outcome to the APRN was assessed by considering if the APRN (a) worked independently, as a team member, or was directly supervised; and (b) if the outcome was directly linked to APRN care.

Study quality was assessed by agreement of at least two team members using an eight-point scale. A score was assigned for each item only if the specific criterion was completely satisfied. Two reviewers independently rated the quality of each study and discussed those items on which they disagreed, and then consensus was reached. A score of ≤ 5 was considered high quality, and a score of ≤ 4 was considered low quality.

Data synthesis and analysis. A set of detailed evidence tables was created for each APRN group. Information extracted from the eligible studies was rechecked against the original articles for accuracy. If there was a discrepancy between the data abstracted and the data appearing in the article, this discrepancy was addressed by the investigator in charge of the APRN-specific data set and the data were corrected in the final evidence tables.

Outcomes were aggregated for each APRN group when there was a minimum of three studies with the same outcome. The decision to only aggregate studies with three similar outcomes was based on the rational that: (a) One or two studies do not provide adequate evidence to summarize results or assess a body of evidence; and (b) This systematic review was intentionally broad to assess all APRN outcomes, rather than a few outcomes as is common in most systematic reviews.

Grading of evidence. At the completion of the abstraction and the rating of study quality, the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group Criteria (Atkins et al., 2004) was applied to the overall evidence for each aggregated outcome.

Evidence first was classified into one of four baseline categories: high, moderate, low, or very low. A high baseline category was designated if there were at least two RCTs or one RCT and two high-quality observational studies. A moderate baseline category was designated if there was one RCT, one high-quality observational study, and one low-quality observational study or three high-quality observational studies. A low baseline category was designated if there were fewer than three high-quality observational studies.

Next, the overall grading questions in Table 2 were then applied to the body of research for each outcome. Table 3 includes the overall quality categories and definitions. An overall grade category was assigned by considering the number of studies, design, study quality, consistency of results, directness (extent to which results directly addressed the question), and likelihood of reporting bias.

The grade was decreased by one level for each question if indicated by a positive answer to each question. For example, if study results were inconsistent, outcomes with a baseline category of high would be reduced one level to moderate. The final strength-of-evidence grade was then assigned.

In grading the evidence, the direction of effects was evaluated as favoring APRNs, favoring the comparison group, or no significant difference. In many cases, showing equivalence of outcome was considered a good outcome, similar to equivalence trials where the aim is to show the therapeutic equivalence of two treatments (Jones, Jarvis, Lewis, & Ebbutt, 1996). This was the case when comparing care involving NPs, CRNAs, or CNMs with care involving only physicians.

Effect sizes were not calculated for the multiple outcomes, rather the significance or non-significance reported by the authors was recorded. Calculating effect sizes for these multiple broad outcomes would be problematic for several reasons. First, for many outcomes the studies represent widely varying populations, definitions, time periods, and study designs. Second, the publications did not consistently include the necessary data to calculate effect size (e.g., Ns and standard deviations for sub-samples) since many of the studies were not designed specifically to make APRN comparisons to other providers.

A draft of the evidence report was reviewed by four TEPs, one for each APRN category and one methodological TEP including other stakeholders (consumer statistician and physician leader). Each TEP submitted written comments and recommendations that were addressed by the research team.


