Risks of Death and Severe Disease in Patients With Middle East Respiratory Syndrome Coronavirus, 2012–2015

Caitlin M. Rivers; Maimuna S. Majumder; Eric T. Lofgren


Am J Epidemiol. 2016;184(6):460-464. 

In This Article


Data Sources

A publicly accessible line listing of MERS-CoV cases, maintained by Dr. Andrew Rambaut and available online,[12] was accessed on August 4, 2015. This line listing contained 1,291 cases of MERS-CoV infection pulled from a number of sources, including the World Health Organization and the government of the Kingdom of Saudi Arabia. This data set has often been more up-to-date than official World Health Organization case reports, especially early in the epidemic. The outcomes available are as reported and do not necessarily reflect the final status of the patient after prolonged follow-up, so some misclassification of outcomes is possible. The majority of MERS-CoV cases occurred in Saudi Arabia, South Korea, and the United Arab Emirates (Appendix Table 1). The outbreak in South Korea was excluded from the analysis because of its unique nature, resulting in 1,105 cases after exclusion.

Exposure Definition and Covariate Selection

Outcomes of interest were death and severe disease. The status of the patient as either alive or deceased was determined by whether or not the patient had died at the time of initial reporting. Patients with severe disease were considered those who had either died from their infection or required critical care at the time of initial reporting, as opposed to those who experienced few or less serious complications.

Risk factors considered were the patient's age, the date of onset of the infection, the presence or absence of any underlying comorbidity such as cardiac or renal disease, reported contact with camels or other animals, whether or not the patient was employed as a health-care worker, whether or not the case was a primary or secondary case (based on reported contact with an existing case), whether or not the case arose in Saudi Arabia (the nation in which the majority of cases originated), the patient's sex, the number of days since January 1, 2012, and the time between onset of infection and subsequent hospitalization.

Missing Data

Because of the emerging nature of the disease, the widely varying sources from which the case reports were drawn, difficulty in case ascertainment, and sparse reporting, the data set used (12) had extensively missing data. There were 920 cases with missing information on 1 or more variables (including outcome variables), making conventional complete-case analysis essentially impossible. Because there was no evidence that these cases were missing data completely at random, estimates could be biased.

We used a bootstrap-based expectation maximization method to multiply impute the missing information.[13] One hundred imputations were used, based on the assumption that all data for the variables included in the analysis, missing or observed, came from a multivariate normal distribution. A ridge prior of 1% of the empirical data was used to assist with the numerical stability of the algorithm. The ridge prior in essence adds an additional number of observations equal to 1% of the data set with the same mean and variance as the observed data, but with no covariance. This shrinks the covariance between the variables in the imputation model and assists the algorithm in converging on a stable solution, which is sometimes necessary with high degrees of missingness, as in this case. Priors using 0.5% of the data or 2% of the data did not result in meaningful differences in the results (not shown).

Regression Models

Poisson regression models using a robust variance estimator[14] were used to estimate the univariate relative risk of either outcome according to each potential risk factor. These models are comparable to those obtained using binomial regression, though often more computationally tractable. Those variables that were moderately associated (P < 0.20) with the outcome were included in a multivariate risk model. All analysis was performed with the R statistical programming language (R Foundation for Statistical Computing, Vienna, Austria) using the Amelia2 package for multiple imputation.[15]

Human Subjects Approval

Because this work used entirely publicly available information with no personal identifiers, it was determined to not require approval by an institutional review board.