Estimating the Effect of Social Distancing Interventions on COVID-19 in the United States

Andrew M. Olney; Jesse Smith; Saunak Sen; Fridtjof Thomas; H. Juliette T. Unwin


Am J Epidemiol. 2021;190(8):1504-1509. 

In This Article



We used 3 different types of data: state-level intervention data, infection fatality rate data, and confirmed case fatality data. These are described in depth below.

State-level Intervention Data. We created a data set[9] of state-level intervention dates by inspecting executive orders, public health directives, and official communications (e.g., press releases) from state governments. For each intervention, we used the effective date unless the timing of the intervention was so close to midnight that it could only practically be implemented the next day. Interventions were only counted if they targeted the general population. The interventions themselves closely paralleled those in the European model we used[7] but had slightly different operationalizations. "Self-isolating if ill" is a recommendation to stay home if sick. "Social distancing encouraged" is a recommendation to avoid nonessential travel and/or contact; the mere words "social distancing" were not counted unless elaborated upon with examples of what social distancing entails. "Schools or universities closing" refers to the date on which schools partly or completely closed; the earlier of schools or universities closing was used. "Banning of sports events" is the banning of sporting events or public gatherings of more than 1,000 persons. "Banning of public events" is the banning of public gatherings of more than 100 participants. Finally, "lockdown" includes banning of nonessential gatherings or business operations, which is sometimes formalized as a stay-at-home or safer-at-home order. Notably, some more restrictive interventions imply others are also in place; for example, lockdown implies all other interventions are applicable, and banning of public events implies banning of sports events.

Infection Fatality Rate Data. The infection fatality rate (IFR), or ratio of fatalities to true infections, was derived using the methods outlined in Flaxman et al..[7,8] Briefly, IFR estimates from Verity et al.[10] were adjusted using an age-specific United Kingdom contact matrix to account for nonuniform attack rates across age groups (see Ferguson et al.[11] for details and previous US application). The resulting IFRs were weighted by state-level age demographic characteristics and averaged to produce estimates that were adjusted for both age and location. Demographic data were obtained from the 2018 American Community Survey 5-year estimates.[12]

Confirmed Case Fatality Data. SARS-CoV-2 fatality data were obtained from the New York Times public data repository,[13] which includes information on the data-collection process and methodology. In general, in the data set, confirmed cases were counted based on where they were treated and on the days they were reported up to midnight Eastern Time. Because this data set provides cumulative counts, we transformed these into daily counts by taking the difference between successive daily cumulative counts (setting this difference to zero in the rare instances in which cumulative counts decreased because of reporting corrections).


We applied an established, semimechanistic Bayesian hierarchical model of interventions to the spread of SARS-CoV-2 from Europe to the United States. The design and details of this model are presented elsewhere[7,8] (see the Web Appendix, available at, for a brief overview). Notably, a recent variant of this model has been applied to US data at the state level, but that variant uses mobility data rather than interventions as the basis of predictions.[14] Briefly stated, daily death counts in the model follow a negative binomial distribution such that their expectation is a function of infections on previous days. The model is semimechanistic in the sense that it incorporates classical susceptible-infected-removed concepts[15] in a Bayesian framework. The number of infected is modeled using a discrete renewal process, and death counts are similarly linked to the number of infected based on the state IFR and the distribution of times from infection to death. Importantly, the model assumes the effect of intervention is the same regardless of location and that the implementation of an intervention instantaneously reduces Rt. Making these assumptions allows pooling of data from states for estimation of intervention effects. The model was specified using Stan,[16] and model inference was performed using adaptive Hamiltonian Monte Carlo. We fit our model with a time series for each state 30 days before the state experienced 7 deaths within the timeframe from February 29, 2020, up to April 25, 2020, when some states began reversing their interventions. Seven deaths is a somewhat arbitrary threshold for excluding imported cases, and others have used 5[14] or 10[8] deaths for this threshold. We chose 7 because it is the highest number we could use and still obtain valid data for states like Alaska, which had a relatively low case count during this period.