SARS-CoV-2 Vaccine Breakthrough by Omicron and Delta Variants, New York, USA

Alexander C. Keyel; Alexis Russell; Jonathan Plitnick; Jemma V. Rowlands; Daryl M. Lamson; Eli Rosenberg; Kirsten St. George


Emerging Infectious Diseases. 2022;28(10):1990-1998. 

In This Article


Data Analysis

Omicron Emergence Analysis. We analyzed emergence of the SARS-CoV-2 Omicron variant from November 28, 2021, through January 24, 2022 (Figure 1). We matched persons infected with Omicron (case-patients) to persons infected with any other virus lineage (controls). Case-patients (n = 1,439) included infection with B.1.1.529 and all BA sublineages (at the time of the analysis, none were classified as BA.2 through BA.5). Controls (n = 728) were persons infected with all other SARS-CoV-2 lineages circulating during the period of Omicron emergence (all sequenced control samples in the matched dataset were Delta variant, B.1.617.2 or AY sublineages). We defined the start of the Omicron emergence period as the first detection in the genomic surveillance dataset (although Omicron was present in the state before that date). The emergence period ended when the last non-Omicron case was detected in the surveillance dataset. One additional case of infection with Delta was identified >14 days after the last date in the surveillance dataset but was excluded because the sensitivity analysis indicated that it would not substantively change the analysis results. We matched case-patients to controls on the basis of specimen collection date (± 6 days), location (using New York state economic regions [Figure 1]), patient age, and patient sex. We matched age according to age groups: 0–4, 5–11, 12–17, 18–29, 30–49, 50–69, 70–89, and ≥90 years. If an exact match could not be found, we allowed mismatches for sex. We used 1-to-1 matching, without replacement (i.e., each case-patient was matched to a unique control). We performed matching in 2 stages. In the first stage, we considered all possible matches for each case-patient. To maximize the sample size, we then sorted case-patients such that the case-patients with the fewest possible matches would be matched to controls first. To estimate odds ratios (ORs) and 95% CIs, we performed 3 sets of conditional logistic regressions.

Figure 1.

Matched case–control pairs used in the conditional logistic regression by analysis for the SARS-CoV-2 Delta variant (March 19, 2021–August 15, 2021) and the Omicron variant (November 28, 2021–January 24, 2022) emergence periods, by economic region (map), New York, USA. The bars correspond to the order given in the legend; New York City is on top when present and Long Island on bottom when present. The dashed line separates the 2 datasets used in the analyses; the Delta emergence period is on the left and the Omicron emergence period on the right. Map base layer was derived from a combination of 2 public domain layers (US Census data, and Natural Earth Administrative boundaries (

In analysis 1, we included vaccinated and unvaccinated persons. Key variables tested were vaccination status (binary: yes/no), booster status (yes/no), vaccine type (none, Pfizer, Moderna, Janssen), time since last vaccination or booster (3 factor levels: unvaccinated, vaccinated <90 days, vaccinated ≥90 days). We explored time since completion of initial vaccination and time since booster but found these factors were less predictive and overlapped strongly with the combined time since last vaccination or booster variable and therefore excluded them.

In analysis 2, we examined the association between patient age and virus lineage and therefore removed age as a matching criterion. We performed a conditional logistic regression using age, other main variables for context, and interactions. For this analysis, we did not perform sorting before matching. We examined age in 2 ways: with each age group treated as a factor and with each age group treated as a continuous predictor. Model exploration revealed that a mixture of categorical and continuous predictors best described the underlying data structure (Appendix Table 1,

In analysis 3, we again matched case-patients to controls on the basis of age, but we excluded unvaccinated persons to allow time since last dose (vaccination series or booster) to be treated as continuous variables. Unvaccinated persons could not be included in this analysis because assigning them NA (not applicable) would cause these values to be excluded, and 0 would be an unrealistic value.

We tested leverage by removing each case–control pair sequentially, refitting the model and noting the change in the OR. We selected models by using Akaike information criterion (AIC) scores.[15,16] Models with lower AIC scores have more model support, and models with ΔAIC >2 are generally considered less likely models. Because a more complicated nested model can be within ΔAIC of 2, nested models were required to be within 2 × no. model parameters to be considered tied.[17] Of note, AIC provides a relative ranking of models but provides no information on the absolute fit of the model. We examined the fit of each model by considering its statistical significance and the OR estimates. When test results were not significant, we examined the magnitude of the OR. More research was deemed necessary if the estimated OR was large enough to be a public health concern but 95% CIs included 1.

We performed all analyses in R 4.1.2[18] by using the package survival for conditional logistic regressions code ([19,20] We created the New York state map in ArcGIS 10.6 (ESRI, by using a 2017 Tiger Shapefile from the US Census Bureau[21] and Admin 1 States, provinces 50-m cultural vector shapefile from Natural Earth Data (as of March 18, 2022) (

Delta Emergence Analysis

We analyzed emergence of the SARS-CoV-2 Delta variant during March 19, 2021–August 15, 2021 (Figure 1). The Delta analyses followed the same methods used for the Omicron analyses but with focal virus lineages (603 case-patients) including B.1.617.2 and all AY sublineages. Nonfocal virus lineages (1,816 controls) were all other lineages circulating during the period of Delta emergence (62% B.1.1.7 and Q.4 Alpha, 20% B.1.526 Iota, 3.5% P.1.X Gamma, 1% B.1.351.X Beta); none of the other non–variant of concern strains (13.7% combined) exceeded 5%. We excluded booster-associated variables because booster doses were not available (Appendix Figure 3). We omitted the vaccinated-only analysis because of low statistical power (n = 12 pairs).

Power Analysis

Statistical power for conditional logistic regression is nonlinear and depends on estimated probabilities. Although we used multiple conditional logistic regression for the analyses described above, to make the power analyses easier to set up and interpret, we calculated statistical power for univariate logistic regression by using the WebPower package[22,23] as a simplifying assumption. We examined statistical power to detect an OR of 2 with a sample size of 110 for a range of probability values (0.1–0.9 for the upper probability); we adjusted lower probability to give an OR of 2. We then used the upper probability value with the highest power (0.7) to assess statistical power for ORs of 2, 3, and 4 for sample sizes of 50–350 by increments of 50.

Data Sources

Respiratory swab specimens that were positive for SARS-CoV-2 by real-time reverse transcription PCR were sent from clinical laboratories across the state for whole-genome sequencing at the NYSDOH Wadsworth Center as part of an enhanced genomic surveillance program. Samples were selected for sequencing on the basis of cycle threshold value and region of patient residence; the goal was full geographic coverage across the state. Sample selection criteria did not change over the course of the study period. We matched samples to demographics in the Communicable Disease Electronic Surveillance System and vaccination records in the New York State Immunization Information System. For persons from whom multiple samples were collected, we included only the earliest collected sample with genome available.

Vaccination status for each person was based on dates of sample collection and administration of vaccines. A person was considered unvaccinated if the sample was collected before any vaccination, vaccinated if the sample was collected >14 days after completion of vaccination (first dose of Janssen, second dose of Pfizer or Moderna vaccine), and boosted if the sample was collected any time after receiving a booster of any vaccine type. We removed from the study persons who were partially vaccinated (sample collected between initial dose and 14 days after vaccination completion, n = 261 [90 with Moderna and 171 with Pfizer vaccine]) and persons who received a greater number of vaccinations than normal. This study does not apply to persons who received a third dose as part of their vaccination series (e.g., potentially immunocompromised persons); these persons were removed from the dataset because of different vaccination history and low sample sizes (58 persons who received a third dose <135 days after their second dose were removed).

Sequencing Methods

We performed whole-genome amplicon sequencing of SARS-CoV-2 by using a modified version of the Illumina ARTIC protocol ( with ARTIC V3 primers in the Applied Genomics Technology Core at the Wadsworth Center, as previously described,[24] and amplified later samples with ARTIC V4 primers. We sequenced samples with particularly low virus titers by using AmpliSeq chemistry on the Ion Torrent S5XL sequencer, as previously described.[25]

GISAID ( accession numbers for sequences are available from In that chart, the first column shows the GISAID accession number, and the subsequent columns indicate whether the identification number was used in the respective analyses. Data are coded such that –1 indicates records that were removed before analysis, 0 indicates records that met the basic overall study criteria but were not matched for a particular analysis, and 1 indicates that the record was included in the analysis.