Know Thine Enemy: Why Genetic Sequencing Is Key to Tracking COVID-19

Amber Dance

April 14, 2021

Editor's note: Find the latest COVID-19 news and guidance in Medscape's Coronavirus Resource Center.

In a typical year, the surveillance team at the University of Washington's clinical virology lab runs about 50,000 tests to identify viruses. Since the first Covid-19 case hit Seattle, where the lab is based, it has done about 2 million. "Forty years of testing, in one year," says its assistant director, Alex Greninger.

That lab is also one of many — spread across state, private and university facilities — that's reading the viral genomes of positive test samples to see if there are any worrisome changes in the virus. The importance of that search became more obvious in December, with the reporting of the first "variant of concern," B.1.1.7, out of the United Kingdom. It has mutations that let it spread more easily than the original SARS-CoV-2 coronavirus.

The rise of that variant, plus B.1.351 from South Africa and P.1 from Brazil, were among factors leading to a renewed focus on surveillance by sequencing — that is, cataloging the order of chemical subunits of the virus's genetic material. The Biden administration has pledged almost $200 million to boost the sequencing effort, and Congress recently approved a $1.75 billion infusion for a program of the Centers for Disease Control and Prevention that includes sequencing.

Sequencing in the US has been stymied by decentralized health providers and payers, a slow rise in testing and underfunding. The situation is improving, and the CDC and its partners are now reporting well over the agency's initial goal of 7,000 sequences per week. However, that's still far fewer than 5 percent of new cases, which some experts see as a good benchmark for genomic surveillance.

How Does Sequencing Help Against Covid-19?

Reading the SARS-CoV-2 genome is a key part of surveillance (which also includes testing, tracking cases and contact tracing). Once a genetic sample from someone's nose or throat is confirmed to be Covid-19-positive, scientists take that sample — a DNA copy of the virus's RNA-based genome — and sequence all of it. They chop the genetic material into bits and use machines to read the sequence of genetic letters (the chemical bases known for short as A, C, T and G) contained in those bits. They can figure out the whole viral genome from the overlapping pieces.

The purpose of sequencing depends, in part, on the progress of the outbreak, says Steve Schaffner, a computational biologist at the Broad Institute of MIT and Harvard University. "In the very beginning, you need to know what it is that's infecting people," says Schaffner, who coauthored a summary of genome analysis during viral outbreaks for the Annual Review of Virology. For SARS-CoV-2, the sequences first obtained in China were quickly used to begin vaccine design.

As the disease spread, scientists used differences in the sequences to build a sort of family tree for the virus, figuring out how it traveled from person to person and place to place. That has important implications for public health measures, says Schaffner. If a virus is spreading only locally, for example, then closing the borders won't do much good. Or if a disease tends to "superspread" from one person to many, as Covid-19 does, then contact tracing should focus on finding the first person to instigate a cluster of cases.

Now, as the pandemic moves into its later stages (we hope) and the virus evolves, "all the focus is on these variants," says Schaffner.

What Went Wrong in the US?

Compared with other nations, the United States' performance in the sequencing arena has been rather lackluster. While as of April 9 the US had submitted 210,000 sequences to a global collection, that amounts to fewer than 7 sequences for every 1,000 cases it's had, compared with more than 200 sequences per 1,000 cases in Iceland, Australia, New Zealand and Denmark. The UK, with 320,000 sequences shared, is the biggest contributor.

"Most countries are doing poorly," says Schaffner. "We've been on the poor end." But the US situation is improving, he adds, with nearly an "adequate" number of sequences coming out now.

US labs couldn't focus on sequencing until they got up to speed with basic Covid-19 testing, which was a sluggish process, says Esther Babady, director of the clinical microbiology service at Memorial Sloan Kettering Cancer Center in New York. "By the time the UK was doing all of this sequencing, we were still struggling with just diagnosing," she says.

The US, with dozens of public health departments, was not set up for a large, coordinated sequencing effort. Plenty of public health, university and private labs "have put a lot of effort into sequencing and are doing a phenomenal job," says Shirlee Wohl, a genomic epidemiologist at Johns Hopkins Bloomberg School of Public Health and coauthor of the review on genomic analysis of viral outbreaks. But the work has been decentralized. Beyond the act of sequencing, labs also need to arrange access to samples, set up reporting pipelines and obtain permission to publish their results. That infrastructure was lacking. The CDC didn't set up a sequencing surveillance collaboration until May of 2020.

Then there's the question of who pays for sequencing. Diagnostic or screening sequencing is done for the sake of the individual, and health insurance pays for it. But surveillance is a bit different. It's for the community's sake, and the samples are supposed to be de-identified. Privacy is a paramount part of the nation's health privacy law known as HIPAA. So if Greninger's or Babady's sequencing efforts find that a sample contains a strange new variant, they shouldn't be able to trace that virus back to an individual patient or inform that person of what they're carrying. Health insurance doesn't pay, and other public funders like the CDC must step in.

Contrast the US situation with that in the UK, where the National Health Service took part in a £20 million (~$27 million) sequencing consortium launched in April 2020, aiming to sequence tens of thousands of viral genomes; it's now over 400,000.

What Happens When Sequencers Find a Variant?

The average SARS-CoV-2 virus accumulates about two mutations each month. Only some of the changes will be dangerous. But which ones? One clue is when a variant spreads quickly; that suggests it might be better at infecting people. Scientists must then go on to test in the lab whether a new variant virus is unusually good at sticking to human cells, say, or at evading antibodies.

With a year's worth of data under the world's collective belt, it's getting easier to predict whether a new mutation is bad news, says Greninger. For example, the variants from the UK, South Africa and Brazil all contain a change such that the virus's "spike protein" now has the amino acid tyrosine at a particular spot, rather than the original amino acid asparagine. That little change makes the virus better able attach to the human protein called ACE2 that it uses to infect cells. Any variant with the same change is likely to be more transmissible — and therefore bad news.

To make predictions easier, researchers at the Fred Hutchinson Cancer Research Center in Seattle took the gene that carries instructions for making the part of the spike that engages ACE2, and then created thousands of mutant versions. Their collection encoded nearly every possible amino acid swap in the part of the protein that binds to ACE2. The researchers then made spike protein from those genes and checked how well each mutant protein could bind to ACE2.

They found that most changes disabled the spike or made the binding weaker, but the team also identified changes that enhanced the interaction. "That's been a big help," says Greninger. If such a mutation shows up in a circulating variant, it could indicate the virus is better able to stick to and enter cells, and thus is more infectious.

Once variants are deemed dangerous or worrisome, there are more steps to take. Labs can develop diagnostic tests specific to the variant, so they can identify it in samples within just a few hours, without sequencing.

Variant data also help policy makers decide about public health measures, such as lockdowns. And vaccine makers can tweak their designs to better combat new variants, as Moderna is already doing with an RNA-based vaccine against the worrisome South Africa variant.

What Is the Situation Going Forward?

US sequencing efforts have been ramping up. The CDC has been collecting SARS-CoV-2 samples from public health agencies since November and has contracted with commercial and university laboratories to add to its sequencing capabilities.

The $200 million coming from the federal government, which the Biden administration calls a "down payment," is expected to more than triple sequencing rates. (Greninger estimates that every viral sequence costs $100­ to $200, including the staff time involved.) The American Rescue Plan Act recently approved by Congress includes an additional $1.75 billion for the CDC's Advanced Molecular Detection program that includes sequencing, as well as funds for associated infrastructure.

"The CDC is much more aggressively contracting and paying for sequencing," says Greninger, and he thinks sequencing 5 percent of cases is "a good stretch goal."

But the future holds challenges for sequencing, too. As at-home tests are rolled out, and as more tests rely on analyzing the virus's proteins instead of its genetic material, there may be fewer genetic samples available for surveillance, Greninger notes.

There is another source of sequences undergoing testing: wastewater. Though this can't say if specific individuals have the virus or a variant, it offers a broad-strokes picture of infections in a population. Wastewater surveillance also may identify a rise in cases a week before patients begin filling hospitals, says Schaffner.

Greninger and Schaffner would love to get even more out of their sequencing efforts. When sequences are linked to de-identified medical records, Schaffner says, it offers more clues about how those sequences determine the course of disease or how effective the various vaccines are at intercepting different variants. He'd like that process to be more routine, so that medical data could easily be integrated with surveillance without violating patients' privacy.

Sequences could also become useful clinically, Greninger says, to decide which therapies a person should get. This already happens with other infections; for example, sequencing the virus in the blood of someone with HIV can tell doctors which drugs will or won't work.

In fact, Greninger dreams of universal sequencing of Covid-19 cases, both for every patient's benefit and for public health. That would require more investment in technology and infrastructure, but once in place, those resources could be used to sequence and understand other viruses too, he says. "This has been an awful year, and we've just got so much growth that will come out of it."

This article is part of Reset: The Science of Crisis & Recovery, an ongoing Knowable Magazine series exploring how the world is navigating the coronavirus pandemic, its consequences and the way forward. Reset is supported by a grant from the Alfred P. Sloan Foundation.

This article originally appeared in Knowable Magazine on April 9, 2021. Knowable Magazine is an independent journalistic endeavor from Annual Reviews, a nonprofit publisher dedicated to synthesizing and integrating knowledge for the progress of science and the benefit of society. Sign up for Knowable Magazine's newsletter.