COVID-19 Modeling and the Path to Herd Immunity

; Youyang Gu, MA


February 08, 2021

This transcript has been edited for clarity.

Eric J. Topol, MD: Hello. This is Eric Topol, Medscape editor-in-chief. I'm delighted to have a chance to have a conversation, one on one, with Youyang Gu, previously from MIT and now doing incredible work on COVID-19. Welcome, Youyang.

Youyang Gu, MA: Thank you, Eric. It's an honor to be on Medscape today and to chat with you.

Topol: You are the first data scientist we've ever interviewed, which is amazing. I know you're only 27, which is also amazing, and your contributions have been vast. Before your work on COVID-19, you went to MIT and did a double major in electrical engineering, computer science, and math and then got a master's degree at MIT. You've also done work with natural language processing and at CSAIL. But let's go back with your upbringing and some of your prior work in data science. Where did this all get started for you?

Gu: I studied computer science at MIT, and it wasn't until probably my master's that I started getting into machine learning and data science. I had more of a systems background in my undergrad. For my master's, I worked on natural processing and using deep neural networks to do a lot of these difficult natural language processing tasks. That's how I got my first exposure to big data and building statistical models to make predictions on data.

After I graduated, I spent a couple of years working in finance. That's where I further honed my modeling background because, as you can probably imagine, it can get very quantitative, and the goal there is to just be as accurate as possible.

Topol: A lot of the great algorithmic talent goes to the finance industry, right?

Gu: Yes. I met some really brilliant and talented people during my time there. I feel like those couple of years helped me develop the skills that I've been using now for the past year while making sure that my COVID modeling is up to par and can be as accurate as possible.

Topol: But you also did work in sports analytics too, right?

Gu: Yes, I did some work in that, and I was working on that right before the pandemic hit. Actually, that's one of the reasons I got started in modeling COVID, because everything was shut down in March. Like many Americans, it impacted me. I was just curious to see where COVID was going, when it could potentially be improved.

The existing models at the time, back in March 2020, weren't doing as great of a job as I thought they could do. So I took a shot at building my own model to see what I could do. I guess it took off from there.

Topol: Well, you're not kidding that it took off. Before we get into, which you singlehandedly run — I mean, you don't have a team, right?

Gu: Yeah, it's just me.

Topol: It's Youyang Gu's COVID-19 projection.

Before we get into that, did you know when you were in the crib that you were going to become a computer data scientist guy? When did you figure this out?

Gu: I actually didn't, really... Growing up, I liked math and science, but I didn't really have much exposure to computer science until toward the end of high school. I had a really good computer science teacher. My dad was the one who was pushing me to study computer science because that's his expertise and that's what he is doing. I was always kind of like, "Oh, I don't want to do what you're doing. I want to take my own path."

But then once I got into MIT, I saw how much this technology and computer science is changing our world, and in many instances, for the better. That's what inspired me to dive into it and learn more about it.

Topol: That's really fascinating. Maybe there is a little genetic talent component to this, but you have definitely found a sweet spot of unmet need.

Projected Deaths: 60,000 or 2 Million?

Topol: Here it is, March 2020. The pandemic is starting to go into full force. You're looking at all of these models out there like the University of Washington IHME and others, and you see that they're really not performing very well. Basically, you said, "I'm going to set one up." When you made that decision, it was a big deal because in the vanguard, you became the number-one go-to modeling force out there. What was it that made you think I can do this really well?

Gu: I didn't expect going into it that I was going to have this model that everyone in the world was going to look at. I started because I was just curious to see where the trajectory was going because, at that time in March, things were getting worse and worse by the day.

The question that a lot of people had was, when would things peak? And so you had two ends of the spectrum where one model, say, like the IHME, was forecasting 60,000 deaths by the summer. And then on the other end of the spectrum, you had the Imperial College model that was estimating 2 million deaths by the summer. That's a pretty wide range, so I didn't really know what to make of it. That's why I thought it was better to write my own model and see who was right.

Topol: What were your inputs that made your model perform so well?

Gu: I think the answer to that would be better phrased as what inputs did I not use, because the only input that I did use was previous deaths. I feel like the other models out there were using too many inputs, too many data sources.

From my experience, when the signal in the data is so low, the data quality is low, so the more data you give it, the worse your outputs tend to be. That's why I didn't want to make it too complicated and I just decided that I was going to use deaths, and I wanted to make it as simple as possible. I just built the whole model in, I think, less than a week. From the beginning to when I had the website live, it was around a week.

Topol: And it became the go-to model for The Wall Street Journal, The Economist, The New York Times, The Washington Post, CNN, NPR, Nate Silver — I mean, everybody.

Gu: I'm curious — how did you first hear about it?

Topol: I learned about you through Twitter. Some people were writing to me early on in the pandemic saying, "Don't even look at those other models. There's only one to look at and it's Youyang's"

I started looking at yours every day. And it was so different from the others and so much more accurate. I just kept testing it and you were phenomenal. You were like the oracle or the soothsayer. It's amazing, actually.

No wonder when you delivered on accuracy, unlike the other models of these very prestigious academic centers with teams of people. Here you are, one person, and you're out with better forecasts. It's incredible. It really was amazing to see this, and it went on, obviously, many, many months.

You would interact with people on Twitter because some of them would challenge you or bring up controversies. How would that go?

Gu: That was actually the most surprising aspect of this whole thing. I felt like, at least to me, the building-the-model part was fairly straightforward in the sense that I've had prior experience at building statistical models. The hard part was starting on Twitter and trying to advertise my model, and getting people to first look at it.

I found that to be the most challenging part because, in the initial stages, I would just be tweeting at reporters, people in academia, and epidemiologists, saying, "Hey, can you look at my model?" Almost all of them went unanswered. Then it all happened in the span of like a week or two, where everything just blew up and suddenly I had all these people who previously weren't responding to me suddenly writing to me, saying, "Yeah, let's talk."

Since then, I've been pretty active on Twitter and I've seen a lot of different things happen on Twitter. It's been an interesting ride. I think you're right: There have been many of those people who are very supportive, but there are also people who are very critical.

I think I try to take in the criticism as much as I can. I think that's actually one of the reasons that my model has been so accurate, because I've been able to get that feedback from hundreds of people at once. Of course, some people put their criticisms more nicely than others.

Topol: Twitter brings out some of the worst, raw kind of communication. You always are very courteous and very respectful. That's another part about you that I think is very distinctive. I've never seen any of your responses that were at all disparaging. You come across through social media as a gentle, respectful, highly obviously intelligent person. You're a model. Not just a model, but a model for communicating.

Separating From the Pack

Topol: Now, one of the things that was curious to me is that you then separated from the pack. You're the most accurate model for COVID-19, and then you decided in the late fall, October or November, that you're going to just stop the projections. What made you do that?

Gu: I think you touched upon it a little bit. First of all, I appreciate your kind words.

For me, what people see on Twitter, me being courteous, I wanted respect, but it hasn't always been that way. I'm sure you've experienced this too, but being on Twitter is sometimes just challenging and it's draining.

Topol: Yeah.

Gu: Especially when you have all these people who are criticizing your model. Some are valid, but many times it's just not valid. For me, I felt like that daily [activity] was tiring me out and it became more of a grind to have to update the model every day.

Topol: And you're doing this all yourself with no grants or funding, right?

Gu: Right. Up until this past year, I've just done everything on my own. I didn't accept any donations or any grants or anything like that.

Topol: That's amazing. Did you have a day job too, so that you could have food and pay your rent?

Gu: I focused on this full-time and I was lucky that my other job was able to support me just for some time. I guess another reason I had to step back was that by the time October came around, the CDC had over 30 models doing death forecasting. I was just one model out of 30, whereas when I first started, maybe there were five models existing, right? And most of them weren't that great.

Topol: Yes. I think the CDC, from what I could gather, learned a lot from you. I think they were doing the Gu adjustment in many of their decisions.

After you decided to take the time out from the COVID-19 projections, you came back with the COVID-19 infections and the path to herd immunity. Was that because you thought, I just can't stay away? I really am into this and now we're going to the next phase? What were you thinking when you decided that you were going to go into phase 2 of Gu?

Gu: When I stepped back in September or October, I didn't expect to be coming back. Back then, that was when that fall surge was really taking off. During that month when I was basically taking a break, I had many people message me and say, "There's a bad surge coming up and we're about to enter a dark period, but it currently feels like there's no guidance, especially on the modeling front."

I heard that from many individuals, including some in the public health community. That got me thinking about what is it that existing models are doing or aren't doing that would be useful. When I talked to several people, I realized that one of the common themes was that people wanted to know what percentage of the population in each state or the country have been infected. That was something that many of those models weren't doing. They were only forecasting deaths.

That's what made me decide to come back and work on this infection estimate. I had been dabbling in it; I think I might have sent you an initial draft back in July.

Topol: I remember that.

Gu: So I thought, Why don't I just tweak that because it seems to be useful and make a model based off of the research that I did back in July. Then I spent a couple of days putting that together and then released it, I think, in November. That's what I've been working on since then. Of course, with the vaccines coming out, I wanted to just do some modeling on that front. That was where the path to herd immunity came about.

Topol: And here again, you distinguish yourself as really being on top of that. One theme that I think is really important is the lesson that you have taught us, which is the whole issue of simple models. You reinforced that having more inputs is not always the best; it's which inputs you focus on.

Modeling the Path to Herd Immunity

Topol: You were the first for me to know that the infections, the so-called confirmed cases, are totally a miscue for how much of a burden of COVID there is. As you know, we've worked on the asymptomatic fraction and published papers on that, and found that at least 30%-35% are asymptomatic.

By the way, just for those on this podcast, it's not just national modeling that Youyang does; it's every state. He drills down into every state. The point is that you went through all the serology studies and you recognized early on that the infection burden was far greater than the confirmed cases. At one point early on, I think it was somewhere around five- to sevenfold and now it's narrowed to two- to threefold because of more testing. Your latest estimates, which coincide with the CDC — I think they got it from you again — it's about 83 million people now in the US have had a COVID infection. Is that about right?

Gu: As of maybe like a month ago, it was around 80 or 90 million. Now it's probably close to 90 million.

Topol: On the CDC site today [February 1] it says 83, but as you say, that's a little dated and it's been a very bad January. That's been enlightening. The other thing is that you've been tracking, of course, all the vaccination and trying to see a path to herd immunity. What's your sense about where we're headed now on this exit ramp of population-level immunity?

Gu: I've been posting weekly updates about where vaccination is going in the US, and last week I sent a pretty optimistic outlook where I said that maybe anyone who wants a vaccine would be able to get one by April, and the majority of adults in the US would be vaccinated by June. That was my estimate last week.

Since then, it seems like the pace of new vaccinations has plateaued. I think you tweeted about that yesterday as well. Actually, looking at the data that was just released by the CDC maybe an hour ago for this past week, there were fewer new vaccinations given than the week prior, which is concerning. We are actually slowing down the pace of getting new people vaccinated.

With that said, I think there are way more second doses being administered this week than last week. I think what's happening is that a lot of these doses are being distributed and given as second doses rather than first doses.

It is something I'm still monitoring to see if that pattern will continue, where first doses are not as prioritized as second doses right now. The increase in new vaccinations may not happen for a while. I'm not quite sure.

Topol: I think your point is a really important one because even though we saw some new high marks at 1.7 million vaccines administered in a day for a couple of days in the past week, there was a significant fraction of second doses in there. We're not making any momentum on new vaccinations. I know you've seen this, Youyang, but many have been clamoring for just giving first doses and delay the second doses so we get more people vaccinated.

That's an unsettled controversy, of course, since the data from the trials is, except for the Johnson & Johnson vaccine, with two doses. It's tricky. Another thing that you and I exchanged notes on is really interesting. Very few people have cued into the fact that there are these 90 million Americans who've already been infected and they already have some immunity. Maybe it's not as good as the vaccine, I think we'd agree, but it's still something.

That could be detected — in fact, guided — by quantitative rapid antibody testing, as an example. Let's say it's rounded off right now; it could be 100 million. Maybe half of these people wouldn't even want to get a vaccine, right?

Gu: I don't know. I haven't seen the surveys and seen what the exact percentage is, but I can imagine that a percentage of those people who have been knowingly infected may not want to get a vaccine.

Topol: It was more likely that you would get an infection if you didn't comply with the mitigation measures. Those are the same kind of people who are calling COVID-19 a hoax or the flu or whatever. Out of that large group of people who have had infections, there's a fair number of people who weren't going to have the vaccine, no matter what. They're not with the program, if you will.

I guess the point of this is, do you ever factor in that there's this big group of tens of millions of people that will help us get to herd immunity? Or do you only count the vaccinated people that get us there?

Gu: Yes, I have to admit that I do kind of assume that immunity comes from two sources, vaccination and infection. I'm not sure why, when I read the news and things like that, I don't see that reference to people who have, it seems like, long-lasting immunity from prior infection. That is going to give us— I don't think "head start" is necessarily the right word, but we are starting at 25%-30% on the way to herd immunity at 60%-80%. We are not starting from zero.

Topol: This is critical. We're not going zero to 80% or zero to 75% or whatever that threshold is. We're going from 25% or 30% to 80%. It's a whole different look.

What I like about this, Youyang, is your optimism. Again, I have developed so much respect for your ability to forecast things using data, and I think this is very rational. That is, the idea that there's a silver lining of this infection that's ripped through the United States, which is that we may be able to capitalize on that and get to herd immunity at an earlier point in time that most envisioned.

Do the other models about herd immunity factor in these tens of millions of people, or do they just disregard them?

Gu: I haven't really seen many other models that try to do this type of analysis that I did.

Topol: The Washington Post had one and they ignored it completely.

Gu: From what I read of the news, I think I've only seen reaching herd immunity in the context of the vaccinations. I've read that you need 70%-80% of the population vaccinated, although I think, realistically, it's a little bit lower, but not that much lower because of the overlap between the people who were infected and the people who are vaccinated.

Variants and Herd Immunity

Topol: I don't know if you factor this in, but the immune escape and the challenges with the new strains, whether it's the UK B.1.1.7 or the South African B.1.351, these may pose some additional stresses on us getting there, of course.

Gu: That's the biggest challenge right now with the model that I have as well. I try my best to explain that the model only works under the set of assumptions that I've outlined. And that assumption right now only includes the B.1.1.7 variant.

Just from reading the articles and papers in the past couple of weeks, it seems like there are many of these other variants that are now taking place in different communities in the US. If those take off... Some of them are quite worrisome because they also seem to potentially escape vaccination immunity. If the efficacy of the vaccine drops to 60%, then all bets are off in regard to this current model on our path to herd immunity. If the R0 increases by 50%, then that's going to significantly change the model, too.

That's something I'm actively monitoring, but right now it's just too early to try to figure out exactly what's going on. I think we'll need to wait a couple more weeks to see where things are at.

Topol: I know that's important. My friend Michael Osterholm from the University of Minnesota was on Meet the Press this past weekend, and he said that while things are looking a little better for the US right now, we're in for the hurricane, category 5.

Do you think that we're going to go through a really rough patch in the weeks ahead? I know this is not part of the path to herd immunity, but it's like a bump in the road here because of B.1.1.7. In San Diego, we're estimating it right now at about 7%. Obviously, it's not homogeneous throughout the country, but what's your sense about the next phase of the pandemic in the US?

Gu: That's my focus these next few weeks — looking at the variants and trying to see if I can incorporate them into my modeling. I think from a public health perspective, it is important to lay out the worst-case scenario of what could happen. Given what we've seen in other countries, like Israel, UK, and the Czech Republic, this kind of fourth wave is definitely very well in our realm of possibility.

With that said, I'm not sure if that's a certainty. It's not a guarantee that we're going to see this fourth surge. For me and my model — you mentioned that you like that I am optimistic — I try to be as realistic as possible. I try to present these unbiased estimates that on average could be higher, could be lower, but it should be exactly right.

I understand that from a public health perspective, you want to have these more conservative estimates because you want to prepare for the worst case. The purpose of public health is to prepare for these worst-case scenarios. I totally understand the experts saying there could be this potentially catastrophic surge. I think it's possible, but I don't know if it's like that.

Topol: I hope this idea that it won't happen here comes true. I'm afraid that this B.1.1.7, which is the one that's going to dominate likely, is a triple whammy. It has superspreader impact and infectiousness. Many of the studies out of the UK suggest a 30% increase in lethality. And then there's also some immune escape, as we saw with the recent Novavax vaccine, where they actually did genomic sequencing in their cases.

It doesn't look good, but I hope you're right, Youyang. I hope that we don't see it. I'm afraid that it's inevitable. I also like to be optimistic like you, but I am afraid that unless we do something more to protect ourselves, whether that's better masks or rapid home tests or all sorts of things we could do, this might be a replay.

As you know, in the first wave of the pandemic, we watched the UK. We watched Italy, France, and Spain. We said, "No, that won't happen here." Look what happened, and you predicted all that. Now we're watching the UK, Ireland, Israel, and Portugal. These are countries very hard hit by this variant. I don't know how we could be immune from the variant, but we'll see, I guess.

Gu: I totally agree. We look at Israel and the UK, and you can see that even though they've put in all this effort to contain and manage, it still was not able to successfully contain that variant. For the US, I agree with you. It's seems like it's inevitable. I guess the silver lining is — again, it's not really a silver lining — but if you look at, for example, California, we estimate that a quarter of the population has already been infected. If you look at Los Angeles County, it's probably closer to 35% and maybe even up to 40% by now.

Even though that's not quite herd immunity, it does seem like that, in combination with the vaccines being rolled out—and I've seen your tweets about it. It really is a race between vaccinations and this variant. I hope that we're able to squeeze that out and ramp up the vaccination efforts in time to really hold back this variant. I think it's possible that we will see an increase in cases. That's more likely than not. Maybe this existing immunity and the vaccinations will be just enough to prevent a full-on outbreak.

Topol: I hope you're right and that's why I've been pushing hard to get this vaccination into the highest gear possible. That's why your point about the fact there's been a drop-off in the first dose in the past week or so is also notable.

Now, what I like is that you provide an objective view. You're looking at data. You're projecting data. What's your overall sense about the pandemic in the United States? Many will say it has been botched to the nth degree and still is not on track. Is that your thought or is this something that you think is a bad virus and we just kind of worked our way through? What is your overall gestalt of this?

Gu: It's definitely been a very eye-opening experience working on this over the past year, just being able to see all the different sides and all the different arguments. I think at the end of the day, really, I have the data and I was able to test many of these arguments that people are making on both sides of the spectrum.

I think, unfortunately, the fact that it happened in an election year definitely made this more political than it should have been. I think there are definitely things we can point out and say that we dropped the ball and this is what we could have done better.

Topol: Right. Of importance, when you brought out the concept of a partial herd immunity, because of places that were getting to high infection rates, the Dakotas and many places that were extremely hard, the Great Barrington group and others would latch on. Scott Atlas said, "Oh, let it rip; this is our way to herd immunity." Of course, that's never really been achieved without a vaccine for a virus. These are some of the political machinations that you had to deal with.

New Horizons

Topol: What's your next step? Where are you going in your career, Youyang? When we get through this pandemic — which we will, of course; we may have an endemic virus, but we'll get through it — what are you going to do next, do you think?

Gu: I think it's still too early to tell. I feel like things are just always changing, especially over this past year. It's very hard to, even for me when my job is to predict things, predict what I'm going to be doing in 3 months or a year from now. I've learned a lot over the past year. I think that there are definitely things, especially in the public health modeling perspective, that could use a more data-oriented approach to solving problems.

Topol: Yes, but if I was at the University of Washington, the IHME, I would want to bring you on to be running the projections. Have any of these groups around the world that are doing this ever contacted you to try to recruit you?

Gu: I've had a couple groups reach out to me, not the University of Washington.

Topol: They're on CNN with Anderson Cooper almost every night, and you're the one who actually had the far more accurate outputs and forecasts. Anderson Cooper hasn't discovered you yet, I don't think, right?

Gu: That's tough to know. I don't blame them because I probably don't have the credibility or the credentials, so the viewers may not know who I am.

Topol: I sure know, and I think many people in the know, know who you are. I hope that this podcast will increase the awareness. You're a data science prodigy. You've overtaken all the established sources in terms of coming out with really great output and projections.

There are a lot of people out there, young like you or even younger, that are thinking about a career as a data scientist, as having an AI-computer path in their career. What would you say to them? Obviously there's limited talent. There's a mismatch of people who are trained and great at this, thinkers like you, vs the need across every industry. What would be your advice in terms of career path?

Gu: It's good to explore different industries and see what you're interested in. For me, that was one of the main things that I was able to attribute my success with COVID-19 modeling to; I've had the experience of working in these different fields, from academia to finance to tech to sports. I've learned what skills are translatable from industry to industry and also just what I'm good at and what I'm not good at.

My advice would be to not be afraid that you may not have experience in a certain field and to just go for it and try. The worst thing that can happen is that it doesn't work out. Then you can just move on to the next one.

For me, it's just knowing that I am able to keep being interested and learning and continuing to hone my skills over the years, and prepare my techniques so that when something like COVID-19 came along, I was able to really hunker down and put something together in a short amount of time.

Luckily, it was useful and people found it helpful. I'm just glad that I was able to be a part in contributing to the efforts to combat and understand COVID-19.

Topol: Wow, you're very humble. I can't thank you enough on behalf of the medical community. I would even venture to say that for so many people in the public who have followed you throughout these big modeling initiatives that you took on yourself and you transcended, superseding the ones that were preexisting, in the case of herd immunity, you're standing alone in many respects.

Thank you for all of your effort. This has been a big, consumptive affair for you. You've really put all your effort into something that you're not really getting any benefit from. You're just helping us all. You're a guiding light, literally, through the use of data, through the use of modeling, machine learning. You're helping us every day.

Youyang, thanks for joining us today. We look forward to following your career. I know, without question, that you'll have a lot of other impacts going forward. Congratulations for all you've done to help us.

Gu: Thank you, Eric. I really appreciate it. It means a lot coming from you. Honestly, I couldn't have gotten to where I am today without the help of the experts and people like yourself who are very big supporters of my work. I think that also really helped in terms of others evaluating my model — to know that someone with your background and expertise has that faith in my work. I also want to thank you for being one of my biggest supporters, if not the biggest. I get messages from reporters saying, "Eric Topol cited your work and so I want to learn more about it."

Topol: It's easy because it's the best there is out there. We'll follow you closely. For many years ahead, there are so many great things that will be happening in your career and the impact you'll have. Thank you for joining us today.

To the listeners and people who have watched or listened to our conversation, we hope we've inspired you with our first data scientist. It's overdue. What a great one we had a chance to meet with and discuss with today. Thank you.

Gu: Thank you for having me. It was a pleasure to be here. I really enjoyed our conversation.

Eric J. Topol, MD, is one of the top 10 most cited researchers in medicine and frequently writes about technology in healthcare, including in his latest book, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again.

Youyang Gu, MA, is an independent data scientist and creator of He was featured in the 2014 book Young Leaders 3.0: Stories, Insights, and Tips for Next-Generation Achievers.

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.