A Comparative Analysis of Sepsis Identification Methods in an Electronic Database

Alistair E. W. Johnson, DPhil; Jerome Aboab, MD, PhD; Jesse D. Raffa, PhD; Tom J. Pollard, PhD; Rodrigo O. Deliberato, MD, PhD; Leo A. Celi, MD, MPH; David J. Stone, MD


Crit Care Med. 2018;46(4):494-499. 

In This Article

Material and Methods

Study Population

Study data were acquired from the Medical Information Mart in Intensive Care (MIMIC)-III database v1.4.[10] MIMIC-III is a large, openly available deidentified dataset comprised of patients admitted to the Beth Israel Deaconess Medical Center (BIDMC, Boston, MA). The database encompasses admissions between 2001 and 2012. Use of the MIMIC-III database was approved by Institutional Review Boards of BIDMC and Massachusetts Institute of Technology. Data extraction adhered to the original Sepsis-3 study as closely as possible.[3,4] We focused on ICU admissions from years 2008 to 2012 for three reasons: antibiotic prescriptions are only recorded from 2003 onward; explicit sepsis codes were introduced at BIDMC in 2004; the group of admissions between 2008 and 2012 are easily identifiable in the database (Supplemental Table 1, Supplemental Digital Content 1, http://links.lww.com/CCM/D208; Supplemental Figure 1, Supplemental Digital Content 1, http://links.lww.com/CCM/D208).

A total of 23,620 ICU admissions were analyzed; of these, we excluded three nonadults, 7,536 secondary (or greater) admissions for patients to avoid repeated measures, 2,298 admissions to the cardiothoracic surgical service since their postoperative physiologic derangements do not translate to the same mortality risk as the other ICU patients, and 18 admissions with missing data. We excluded patients suspected of infection more than 24 hours before ICU admission as MIMIC-III only contains ICU data (excludes 1,250 patients), and more than 24 hours after ICU admission as we chose to focus the majority of patients who are admitted to the ICU with sepsis (excludes 824 patients). The final cohort contained 11,791 patients.


As in Seymour et al,[4] our primary outcome was hospital mortality and the secondary outcome was a composite of hospital mortality and/or prolonged (≥3 d) ICU length of stay (LOS).


In our study, we precisely replicated the Sepsis-3 task force[3,4] definition of suspected infection as the acquisition of a body fluid culture temporally contiguous to administration of antibiotics. Other data extracted included patient demographics and all necessary variables for calculating SOFA scores,[11] which were calculated using data from the first 24 hours of the ICU stay. The Sepsis-3 criteria for sepsis were extracted as suspected infection with associated organ dysfunction (SOFA ≥ 2). Five other definitions of sepsis were extracted: 1) explicit criteria: the presence of at least one of the two proposed International Classification of Diseases, 9th revision (ICD-9), codes explicitly mentioning sepsis (995.92, severe sepsis and 785.52, septic shock); 2) Angus methodology: ICD-9 codes for sepsis as proposed by Angus et al;[8] 3) Martin methodology: ICD-9 codes proposed by Martin et al;[9] 4) the Centers for Medicare & Medicaid Services (CMS) criteria: an adaptation of the CMS Severe Sepsis and Septic Shock Management Bundle (National Quality Forum no. 0500) which uses a combination of diagnostic ICD-9 codes, Systemic Inflammatory Response Syndrome (SIRS) criteria, and specific thresholds for organ dysfunction;[12] and 5) the Centers for Disease Control and Prevention (CDC) complete surveillance criteria, which use suspicion of infection criteria that are identical to Sepsis-3 along with organ dysfunction criteria that are similar (but not identical) to SOFA.[13]


Demographics for the cohort were extracted. The cohort was also grouped by survival at hospital discharge, and statistical comparison between these groups was done using the two sample t test, Pearson's X[2] test, or the Mann-Whitney-Wilcoxon U test, as indicated. We evaluated SOFA against primary and secondary outcomes. The discrimination of SOFA was evaluated using the area under the receiver operator characteristic (AUROC) curve. We compared the population identified by the Sepsis-3 criteria with other populations identified by three methods: visually, using Cronbach's alpha, and via their relationship to the primary and secondary outcomes. Statistical significance was set at the 0.001 level as in Seymour et al.[4]

As MIMIC-III is open to the public, our study is completely accessible, reproducible, and available online.[14]