The Quality and Effectiveness of Care Provided by Nurse Practitioners

Julie Stanik-Hutt, PhD, ACNP-BC; Robin P. Newhouse, PhD, NEA-BC; Kathleen M; White, PhD, NEA-BC; Meg Johantgen, PhD, RN; Eric B. Bass, MD, MPH; George Zangaro, PhD, RN; Renee Wilson, MS; Lily Fountain, MS, CNM; Donald M. Steinwachs, PhD; Lou Heindel, DNP, CRNA; Jonathan P. Weiner, DrPH


Journal for Nurse Practitioners. 2013;9(8):492-500.e13. 

In This Article


The systematic approach used for this review included identifying and selecting relevant studies, reviewing and rating the individual studies, and then synthesizing findings on patient outcomes and grading the aggregated results. The project team comprised nurses, a physician, health services researchers, and experts on systematic reviews.

Data Sources and Searches

A sensitive search strategy was developed with the assistance of a science search library specialist and a technical expert panel (TEP) comprising NPs with expertise in professional practice, NP education, and outcomes review. A variety of terms used to refer to NPs (eg, advanced practice nurse, MD extender, nurse clinician, nurse consultant) were used in addition to the terms outcome, quality, safety, and effectiveness, and a broad variety of other associated terms (eg, quality of care, costs, errors, malpractice) to search for articles. The search string with MeSH terms are listed in the main study report.[28,29] The following databases were searched systematically: Proquest, Cochrane, Pub Med, and the Cumulative Index to Nursing and Allied Health Literature.

Study Selection

Studies that met the following criteria were included: randomized controlled trial (RCT) or observational study of at least 2 groups of providers (eg, NP working alone or in a team compared to other individual providers working alone or in teams without an NP), carried out in the US between 1990 and 2009, with patient outcomes for quality, safety, or effectiveness reported.[28,29] Studies conducted outside the US were excluded because NP education, role implementation, and scope of practice in other countries are different and access, insurance, costs of care, and other characteristics of health care systems in other countries vary significantly from the US.

Studies in which NPs worked autonomously or in collaboration with MDs, as compared to MDs working autonomously or in collaboration with other MDs, were included with the knowledge that the critical difference between these 2 provider groups was the addition of the NP. Because provider practice and health care interventions change over time, studies prior to 1990 were excluded. Studies reporting only processes of care (eg, self report of completion of selected patient assessments or care documentation) were not included as they measure care delivery and practice activities rather than actual health outcomes. Studies were also excluded if they were not published in English or failed to report quantitative data or outcomes that could reasonably be expected to be affected by NPs.

The review proceeded from titles to abstracts and then to the full articles following a sequential multi-step process (Figure 1). The Web-based database software TrialStat® was used to store and organize all citations, develop standardized abstraction forms for the review, and allow reviewers to access the studies. Two independent reviewers examined and determined, according to the criteria listed above, whether to include or exclude each title, abstract, and full article. If articles met inclusion criteria after examination by both reviewers, they were included in the final data abstraction. Differences of opinion regarding article eligibility were resolved through consensus adjudication.

Figure 1.

Summary of Literature Search (Number of Articles)
Note: Reason for study exclusion can be attributable to more than 1 category
APN = advanced practice nurse; CNS = clinical nurse specialist; CNM = clinical nurse midwife; CRNA = certified registered nurse anesthetist; NP = nurse practitioner.

Data Extraction and Quality Assessment

After applying the criteria described above, a sequential review process was used to abstract data from remaining articles. Data abstraction forms were completed by the primary reviewer and checked for completeness and accuracy by the second reviewer. Personnel with both clinical and methodological expertise were included in reviewer pairs. The reviews were not blinded. Consensus adjudication was used if differences of opinion between the reviewers could not be otherwise resolved.

Quality assessment is used in a systematic review to examine potential threats from individual studies to the validity of the findings. The Jadad scale (designed for RCTs that use double-blinding, etc), which quantifies the presence or absence of certain design characteristics, is commonly used to assess quality.[30] A modified quality scale informed by the Jadad scale was developed to better assess the quality of studies (both RCTs and observational studies) represented in this review (eg, similarity of groups and settings, group sample sizes, potential sources of bias).[28,29]

The quality of each study was independently rated by 2 reviewers using the modified Jadad and scale items scored differently by the 2 reviewers were discussed. The modified Jadad scale yielded scores ranging from 0–8. A study quality score of ≥ 5 was considered to be high quality, and a score of ≤ 4 was considered to be low quality. These categories were determined independent of score distribution and based on the judgment that a study scoring ≤ 4 was likely to represent high bias and low attribution. The same criteria and cut points were used for both RCT and observational studies.

Data Synthesis and Analysis

While studies reporting a broad range of outcomes were included, only outcomes that were reported by at least 3 studies were selected to aggregate. The study results for these outcomes were summarized. A 2-step process was then used to evaluate the quantity and consistency of the evidence strength. First, the strength of the evidence from the aggregated outcomes was assigned a baseline grade of high, moderate, low, or very low. The initial strength of evidence was graded as high if it was supported by at least 2 RCTs or 1 RCT and 2 high-quality observational studies. The initial strength of evidence grade was moderate if supported by either 1 RCT, 1 high-quality observational, and 1 low-quality observational study or by 3 high-quality observational studies. The initial strength-of-evidence grade was low when there were fewer than 3 high-quality observational studies.

Strength of the aggregated evidence was graded a second time using an adapted GRADE Working Group Criteria.[31] This process provided a systematic, transparent, and "explicit approach to making judgments about the quality of evidence and the strength of recommendation."[31] The body of evidence for each outcome was graded using the adapted GRADE criteria, which included consideration of the number, design, and quality of the studies; consistency and directness of results (extent to which results directly addressed our question); and likelihood of reporting bias. Using these criteria, the baseline grade was re-examined. The grade for each outcome was decreased by 1 level for each of the following: if the body of evidence was sparse, not of the strongest design to answer the question, had poor overall quality, results were inconsistent, or there was a possibility of reporting bias. The final strength-of-evidence grade was then assigned.

In grading the evidence, the direction of effects was evaluated as to whether it favored NPs, favored the comparison group, or made no significant difference. In many cases, showing equivalence of outcome was considered a good outcome, similar to equivalence trials where the aim is to show the therapeutic equivalence of 2 treatments.[32] This was the case when comparing outcomes of care involving NPs with outcomes of care involving only physicians.