Development and Validation of a Clinical Risk Model to Predict the Hospital Mortality in Ventilated Patients With Acute Respiratory Distress Syndrome

A Population-Based Study

Weiyan Ye; Rujian Li; Hanwen Liang; Yongbo Huang; Yonghao Xu; Yuchong Li; Limin Ou; Pu Mao; Xiaoqing Liu; Yimin Li


BMC Pulm Med. 2022;22(268) 

In This Article


Data Sources

All data used in the study was extracted from Medical Information Mart for Intensive Care III (MIMIC-III) database (v1.4),[18] Medical Information Mart for Intensive Care IV (MIMIC-IV) database (v1.0) and eICU Collaborative Research Database (eICU-CRD).[19] The MIMIC-III includes unidentified health-related data of more than 60,000 ICU stays at Beth Israel Deaconess Medical Center (BIDMC) from June 2001 to October 2012. The MIMIC-IV consists of data of BIDMC from 2008 to 2019. The eICU-CRD is a multicenter database comprising identified health data associated with over 200,000 ICU encounters from 335 units at 208 hospitals located throughout the US between 2014 and 2015. Authors who conduct data acquisition from the databases have completed the course Protecting Human Research Participants on the website of National Institutes of Health and obtained the certification (Record ID: 28006489) prior to accession. The three databases have received ethical approval from the Institutional Review Boards (IRBs) at BIDMC and Massachusetts Institute of Technology (MIT). As the databases do not contain identified health information, a waiver of informed consent was included in the approval.

Study Population

All patients in the MIMIC-III, MIMIC-IV and eICU-CRD databases that meet the following criteria will be included in the study. The inclusion criteria were: (I) patients who were 16 years old or more; (II) patients diagnosed as ARDS in the first 48 h of ventilation; (III) receiving invasive ventilation for at least 48 consecutive hours. As onset of ARDS is acute and our cohort is only recently mechanically ventilated patients, patients receiving ventilation through a tracheostomy cannula were excluded. And patients who were extubated or died during the first 48 h were also excluded. Worth noticing, only data of the first ICU admission of the first hospitalization were analyzed. The subjects pooled from MIMIC-III and eICU databases were randomly divided into the training set (70%) to develop the model and the internal validation set (30%) to test the performance of the model. Cohort extracted from MIMIC-IV database according to the same inclusion criteria of MIMIC-III and eICU was served as the external validation cohort. In MIMIC-IV, only data between 2014 and 2019 were included to avoid data duplication with MIMIC-III.

Data Extraction

Structured Query Language (SQL) based on PostgreSQL tools (version 9.6) were used for data extraction. Considering patients from MIMIC-III were admitted before publication of Berlin definition, presence of the ARDS in the first 48 h of ventilation was identified according to the Berlin definition[1] with the SQL code published by PROVE Network Investigators.[20] As the patients from eICU-CRD and MIMIC-IV were admitted at least one year after publication of Berlin definition, we hypothesized patients would be diagnosed as ARDS according to Berlin definition and identified ARDS with International Classification of Diseases (ICD) in the databases. Following demographic data were extracted: age, gender, ethnicity, weight, height, and body mass index (BMI) at the first ICU admission. Medical history included number of comorbidities, asthma, congestive heart failure (CHF), atrial fibrillation (AFIB), chronic renal disease, liver disease, chronic obstructive pulmonary disease, coronary artery disease (CAD), diabetes, hypertension, stroke, and malignancy. Information of diagnosis was also extracted for exploring the etiology of ARDS by classifying ARDS into direct (pulmonary) or indirect (extrapulmonary) ARDS according to previous studies.[21,22] The usage of vasopressor within the first 24 h of ICU admission was collected. The score including SAPS II in MIMIC-III, APACHE IV in eICU, OASIS and SOFA in the three databases were calculated using the original data. Age, PaO2/FiO2, and Plateau Pressure Score (APPS)[8] were also calculated. Then, we collected vital signs of the patients within the first 24 h of ICU stay and within the first 24 h of IMV, including heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), temperature, respiratory rate and oxyhemoglobin saturation (SpO2). Afterwards, laboratory values within the first 24 h of ICU admission and within the first 24 h of IMV, such as blood routine examination, liver and kidney function, blood glucose, and arterial blood gas (ABG) were extracted. Furthermore, the ventilator parameters within the first 24 h of IMV were also extracted. Owing to the high sampling frequency, we use the maximum, minimum and the mean value when incorporating the characteristics of vital signs, while the related laboratory indicators and ventilator parameters were presented with the maximum and minimum. The data of in-hospital death record were also extracted.

Statistical Analysis

Normally and non-normally distributed continuous variables were presented as the mean ± SD and the median with interquartile range (IQR) respectively. Continuous variables of normal distribution were tested by Kolmogorov-Smirnov test. Student's t-test, One-way ANOVA, Mann-Whitney U-test or Kruskal-Wallis H-test were used to compare continuous data, if appropriate. Categorical variables were expressed as numbers with percentages and assessed using the Chi-square (χ 2) test or Fisher's exact test according to different sample sizes as proper. The Multivariate Imputation by Chained Equations (MICE) package was used for imputations of missing data. Variables whose missing data more than 30% were excluded from the variable selection process.

All patients in the training set were included for variables selection and risk model development. A total of 176 variables were finally entered into the selection process. Least Absolute Shrinkage and Selection Operatory (LASSO) regression was employed to identify the potential strong predictors. Subsequently, variables identified by LASSO regression analysis were entered into the Logistic regression model and those that were consistently statistically significant were further applied to construct the risk model. A nomogram was used to interpret and visualize the risk model.

The risk model was validated in the validation sets. To assess the discrimination of the model, the areas under the receiver operating characteristic curves (AUROCs) for our model and other severity scores were calculated. The calibration slope and the Brier score was constructed for the evaluation of calibration. Decision curve analysis (DCA)[23] was used to determine the clinical usefulness of our model by quantifying the net benefits at different threshold probabilities. The net benefits were calculated by subtracting the proportion of all false-positive patients from the proportion of true-positive patients and by weighing the relative harm of for-going interventions compared with the negative consequences of unnecessary intervention. To assess whether the performance of our model would be affected by the etiology of ARDS and source of patients admitted to hospitals, we further compared the model performance between direct ARDS and indirect ARDS, as well as the model performance between transferred and non-transferred patients.

The data were analyzed with R software (version 4.0.3, R Foundation). A two-tailed P < 0.05 was considered statistically significant.