Correlation Between Clinical and Wastewater SARS-CoV-2 Genomic Surveillance, Oregon, USA

Devrim Kaya; Rebecca Falender; Tyler Radniecki; Matthew Geniza; Paul Cieslak; Christine Kelly; Noah Lininger; Melissa Sutton


Emerging Infectious Diseases. 2022;28(9):1906-1908. 

SARS-CoV-2 variant proportions in a population can be estimated through genomic sequencing of clinical specimens or wastewater samples. We demonstrate strong pairwise correlation between statewide variant estimates in Oregon, USA, derived from both methods (correlation coefficient 0.97). Our results provide crucial evidence of the effectiveness of community-level genomic surveillance.

Genomic surveillance to detect SARS-CoV-2 variants has become a critical component of monitoring the virus over time. Both patient- and community-level surveillance through the sequencing of clinical specimens and wastewater samples can detect variants and estimate their proportions in a population. Sequencing wastewater for SARS-CoV-2 variants is an emerging science that offers several advantages over patient-level surveillance, including reduced cost and tracking of cases regardless of symptoms or testing access,[1,2] but few data have demonstrated comparable effectiveness in estimating variant proportions over time.[3–5] We describe the correlation between SARS-CoV-2 variant proportions detected through sequencing of wastewater samples and clinical specimens in Oregon, USA, during February 7, 2021–February 26, 2022.

In brief, 24-hour composite samples were collected ≥1 time each week from wastewater treatment facility influents for sequencing. We quantified SARS-CoV-2 RNA concentrations via droplet digital reverse transcription PCR and sequenced positive samples on a HiSeq 3000 or NextSeq 2000 sequencer (Illumina, by using the Swift Amplicon SARS-CoV-2 Panel and Swift Amplicon Combinatorial Dual indexed adapters (Integrated DNA Technologies [IDT] Swift Biosciences,, according to the manufacturers' protocols, as previously described.[6]

During each surveillance week of the study period, we used clinical specimen and wastewater sample data to estimate the proportion of SARS-CoV-2 variants according to US Centers for Disease Control and Prevention variant of concern (VOC) designations.[7] We defined the circulation period of each variant by its earliest and latest detections in either wastewater or clinical specimens; we included estimated proportions of 0 that fell within a variant's circulation period in all analyses. To estimate variant proportions using clinical data, we divided the number of specimens for each variant by the total number of SARS-CoV-2–positive specimens from Oregon submitted to the GISAID database ( by surveillance week.[8] To estimate variant proportions using wastewater data, we divided the statewide gene copies of each variant by the total gene copies of all variants by surveillance week. To derive the denominator, we normalized the SARS-CoV-2 concentration to wastewater influent flow at each facility and summed the values for all facilities by surveillance week. To derive the numerator, we multiplied the normalized SARS-CoV-2 concentration by the proportion of sequence reads for each SARS-CoV-2 variant detected at each facility and summed the values for all facilities by surveillance week.

We used the Pearson correlation coefficient (r) to assess the relationship between the statewide weekly estimated proportions of each VOC detected in clinical specimens and wastewater samples. We used simple linear regression with a least-squares regression line to assess goodness of fit (R2) and considered p<0.05 statistically significant. We used Stata version 17.0 (StataCorp LLC, for all analyses.

Of 488,308 confirmed COVID-19 cases in Oregon during the study period, 38,386 (7.9%) clinical samples were sequenced and submitted to the GISAID database. Of 2,948 wastewater samples collected from 42 communities, 2,852 (97%) tested positive for SARS-CoV-2 and 2,749 (96%) were sequenced. We included 233 pairs of estimated proportions in the correlation analysis and rounded all estimates to 0.001.

Overall, statewide weekly estimated percentages of each SARS-CoV-2 variant detected in clinical specimens were strongly associated with those from wastewater samples; r was 0.97 for all variants (p<0.0001) (Figure). However, r fluctuated by SARS-CoV-2 variant, from 0.61 for Beta to 0.98 for Delta, and we noted a general increasing trend in r as total variant proportions increased (Table). A scatter plot demonstrated a linear relationship between estimated percentages of each variant derived from clinical specimens and wastewater samples (Figure, panel B). The conditional SD was greatest for proportion estimates of 0.2–0.6. Simple linear regression demonstrated a strong linear relationship between estimated proportions derived from both genomic surveillance data sources (R2 = 0.94; p<0.0001).


Comparison of SARS-CoV-2 genomic sequence data from confirmed COVID-19 case clinical specimens and wastewater samples collected in Oregon, USA, February 6, 2021–February 26, 2022. A) Percentages of different SARS-CoV-2 variants detected during each epidemiologic week. B) Scatter plot comparing variant detection frequency by sample type. Clinical specimens were retrieved from the GISAID database (

Our pairwise correlation analysis demonstrates the effectiveness of wastewater sequencing for estimating SARS-CoV-2 variant proportions at the statewide level over time and at varying prevalences. Overall, the association between estimates of variant proportions produced from clinical specimens and wastewater samples was strong. However, correlations varied by VOC and were weakest for the least prevalent variants.

A limitation of wastewater surveillance is that it excludes populations without access to municipal sewer service (i.e., those with septic systems); therefore, it might not be generalizable to all populations within a state. However, for other areas, leveraging wastewater surveillance for SARS-CoV-2 genomic surveillance offers several advantages over estimating variant proportions from clinical specimens. Because wastewater surveillance does not rely on healthcare access, testing acceptance, and molecular testing availability, it likely provides more robust and less biased estimates than sequencing of clinical specimens. Thus, wastewater genomic surveillance could prove valuable in surveillance for many other pathogens of public health concern.