Assessing Risk for Suicide at Point of Gun Purchase

F. Perry Wilson, MD, MSCE


July 11, 2022

This transcript has been edited for clarity.

Welcome to Impact Factor, your weekly dose of commentary on a new medical study. I'm Dr F Perry Wilson of the Yale School of Medicine.

As you may notice, we're doing things a little bit differently this week. My usual producer, Ryan McAvoy, is out on a well-deserved vacation. So I'm shooting this in my office via webcam, using some slides. I'm doing my best with my video-editing skills coming into play.

This week I'll be talking about guns. It's been the summer of the gun, to some extent. We had the Highland Park mass shooting on July 4th, which was a terrible tragedy. Of course, the Uvalde shooting at Robb Elementary School is still fresh in our minds, and there have been many other mass shootings. It would be too long a video to list all of the mass shootings that have occurred this summer. But I do want to remember the shooting at the Tulsa, Oklahoma, hospital where two physicians lost their lives.

Mass shootings, though, are in some sense the tip of the iceberg. So far in 2022, there have been 323 mass shootings in the United States. That is an astounding number to think about, given that we're just a little bit more than halfway through the year. But when it comes to gun violence in the United States, that's just a marker of the most extreme shooting events we see. Mass shootings generate so much discussion and interest because of the magnitude of the tragedy, obviously, and also the chaotic nature, the randomness, the feeling that no one is safe. But if you look at the numbers in terms of gun deaths, so far in 2022, there have been 22,864 gun deaths in the United States (courtesy of the Gun Violence Archive).

The vast majority are not mass shootings. In fact, if you look at the data, the majority of gun deaths in the United States in 2022 were due to gun-based suicide. And, of course, a significant chunk are homicides, accidents, etc., but we don't talk about suicide in the context of gun violence as often as we should, especially given that the majority of deaths come from gun suicide. I want to change that this week.

Using a gun to attempt suicide is about the most effective way it can possibly be done. A study from Annals of Internal Medicine in 2019 looked at fatal suicides across the United States. Here you can see the lethality of various forms of attempted suicide.

For example, deliberate overdosing as a suicide attempt results in death less than 2% of the time. Nothing comes close to the 90% mortality rate of suicide attempt by gun. This may seem pedantic, but this is what guns are designed to do. They're designed to kill, and they are very effective at doing that.

This is just one front on the question of where guns belong in American society. Focusing on suicide is a reasonable point because of statistics like these. People will say, "Oh, well, you know, people who are determined to commit suicide will find a way; whether they have access to a gun or not, they're going to find a way." But the data don't really support that.

Here are data from California in 2020, in The New England Journal of Medicine. California is one of those states that just has much better records on these kind of things than many other states, which is why you see a lot of research out of California on this. And basically what it looks at is the suicide rate among gun owners vs non–gun owners and the rate of suicide by gun among gun owners and non–gun owners.

It's stratified here by men and women. Looking at the suicide rate overall, it's significantly higher in people who own guns. Part of that is explained by the fact that if you own a gun, it is much easier to successfully commit suicide. You can see that the gun suicide rate (which is per 100,000 individuals) is 44 vs 49 (the overall suicide rate). In other words, if you own a gun and you are going to commit suicide, you're pretty much committing suicide by gun, whereas people who are non–gun owners are not only much less likely to commit suicide overall but also less likely to commit gun-based suicide. So it's not just that people will figure out a way. Guns are incredibly effective methods to end a life, and that may be the real problem.

What's interesting about this dataset is that the overall mortality rate (cancer, heart disease, etc.) was actually a bit lower in gun owners. They may be a healthier bunch overall. I'm not sure about the explanation for that, whether it's being outdoorsy, hunting, getting more exercise — I don't know. But the suicide rate is three- to fivefold higher in men, and almost 30 times higher in women, who are gun owners. So it is a major risk factor.

It may partly be because of the access to a gun. It's so effective that if you have it there at the moment when the depression is so deep and you lost control, it can be over with at that point because the gun is so effective. You don't need much of a plan. You just need access to a gun.

This shows the rate of people in that same study who are not dead from suicide. So the curve is people who died from suicide who are gun owners compared with non–gun owners (the dashed line). Looking at the time from when the gun is purchased, there is a steep slope before it starts to level out. These data have been replicated in several datasets, and they suggest that the highest risk for suicide is shortly after someone purchases a gun. Keep that in mind, because it bears directly on the study I want to talk to you about today, which appears in JAMA Network Open: "Machine Learning Analysis of Handgun Transactions to Predict Firearm Suicide Risk." You might recognize the name Garen Wintemute. He is a very prominent gun violence researcher who has written a lot on this subject — a pretty knowledgeable guy.

What's interesting here is that because we know that the risk for suicide increases substantially right after a person buys a gun, can we predict the risk for suicide at the time someone buys a gun? And if we can, what should we do with that information?

This is how this study was set up.

Basically, it's a California study because they have very good records. Any time a handgun changes hands in California, it has to be recorded by law on the Dealer's Record of Sale worksheet, which registers who had the gun, who gave it to whom, and information about the gun itself. They had 1.9 million people in this database, which represents almost 5 million handgun purchases. This is common. People who own guns tend to own more than one.

Of these 1.9 million people, 2614 died by handgun suicide. That was linked to vital statistics data from the state of California. That's our setup. We know who bought a gun and when. We know the characteristics of the gun and we know that some small but significant proportion of them would die by suicide within the next 12 months after buying that gun. That's their target for prediction.

Based on these forms, they have a lot of data to put into a machine-learning model to predict suicide — demographics like age, sex, race, ethnicity, etc.

They had geo-coded addresses of the purchaser of the gun and the dealer of the gun; census tract information that tells you things like poverty, insurance status, etc.; and county-level firearm ownership and suicide information that standardizes people to the place where they come from. For example, whether this a county that has a high level of suicide at baseline, regardless of gun ownership, in which case overall risk is going to be higher, but also characteristics about the gun itself. What's the caliber, the chamber size? What type of gun is it — a revolver or a semi-automatic? They have characteristics of the transaction itself and, importantly, the number of prior purchases that this individual has made over the past year, 1-5 years, and 10 years. All of these are potential predictors of suicide.

But how does it work? Can you predict suicide based on this type of information? They used a machine-learning algorithm called a random forest. I'm going to briefly diverge here to tell you how random forests work. It's sort of a family of machine-learning algorithms.

This is what's called a decision tree.

It's a really simple algorithm that takes a single factor at the top and splits it into two pieces — let's say the predictor's age. So the top predictor here is going to be age. The algorithm will walk through the age from youngest to oldest, trying various splits and measuring how different the groups are in terms of risk for suicide.

It will find the best dividing line by age. Let's say 55 years is a good split. You have the under 55s on one side and the over 55s on the other, and there's a pretty good difference in suicide rates when you split them like that. That takes you to the second level of the decision tree where you can have another factor — for example, sex, which is not a continuous variable so it splits more easily. Or you might have another predictor, such as blood pressure.

The algorithm marches down until you get to a small number of people and you have the suicide rate at those endpoints. That's a single decision tree.

Single decision trees can work okay for some things, but a random forest creates a forest of decision trees where each might start with a different factor. One tree might start with age but another tree could start with handgun caliber size, for example. And so 1000 of these decision trees are made. Each individual patient is then walked through each of the 1000 decision trees. At the end, they get a prediction about whether they are going to commit suicide (red is yes and no is green). Those predictions are averaged across all 1000 decision trees. Essentially, all of these different trees are now voting on what they think is likely to happen. And importantly, you get a prediction that basically says yes or no: Yes, I think there will be a suicide; or no, no suicide. But you also get a number that you can interpret sort of like a chance for suicide. A higher number is worse, but it doesn't map exactly to a percent risk for suicide.

So in this study, how did their model perform when they fed it all these data? And by the way, they did this appropriately; they had a training set and a validation set. Statistically, it looks fine to me.

Their area under the curve (AUC) was 0.81. People always ask me what that means. The AUC (more specifically, the area under the receiver operating characteristic curve) is a measure of discrimination. If you have an AUC of 0.81, it means that if you take one person in a dataset who will commit suicide and one random person who won't commit suicide, your model will predict a higher score in the person who actually does commit suicide 81% of the time. So it balances the outcomes. Taking two people, one who had the outcome and one who didn't, how often do you pick the right person using your model? In other words, an AUC of 0.5 would be no better than chance and an AUC of 1.0 would be perfect. So 0.81 is pretty good but far from perfect.

I'll show you what that boils down to.

Of all the variables, what can we learn about who is at risk here? This shows one measure for how you can tell what's important in a machine-learning model. Basically, this graph says that if you remove this factor (or mix up all the values in this factor so they're nonsense), how much do you hurt the accuracy of your prediction? Firearm category was the number-one predictor in terms of increasing the accuracy of prediction. Firearm category broke down to revolver vs semiautomatic handgun. People buying revolvers were at substantially higher risk. This is interesting because semiautomatics are the more common purchase compared with revolvers.

We can only speculate as to why this is a marker of suicide risk. One obvious reason is that you don't need a high-capacity semiautomatic weapon to commit suicide. You only need one bullet. A revolver might do just fine. Maybe more research is needed there.

Race and ethnicity were important. White men tend to be at higher risk for suicide than other racial and ethnic groups. The month of the year matters a little bit.

Let's break this down now to the risk across the spectrum. You're going to get a number at the end of this machine-learning model that ranges from 0 to 1, where 0 is the model saying, I don't think this person is at any risk, and 1 is the model saying, I'm super-certain that this person is at risk for suicide in the next year.

They broke these numbers down into ventiles — 20 equal-sized buckets.

The number of people in each bucket was the same. In the highest risk bucket — the top 5% of risk — almost 40% of the people will commit suicide within the next year. That's really high. At the lowest risk level (the lowest 5% of risk), the risk for suicide is almost zero. That's based only on the factors I showed you going through this algorithm. It doesn't know anything about the mental status of the person. No interviews or surveys were done. The model had no objective measures, such as blood pressure or heart rate. And yet, you get some decent predictions.

Let's zoom in on those highest-risk groups.

Among those with scores of .98 or .99, virtually all actually end up committing suicide within a year. That's really interesting, except that I need to point out that there are very few people in these super–high-risk groups. The .95-and-higher group contains a total of 35 people. Out of 1 million gun purchases, only 35 of them turn out to be super–high risk. These people are at very high risk for suicide. But you are going to miss a ton of people because very few people have all their stars aligned in the algorithm to have the highest possible score. So although this looks really good, you're not capturing everyone. You're just flagging these rare, ultra–high-risk individuals.

We have these risk thresholds that the model is going to output, but what can we do with that information? Full disclosure here: I had to do some of the math. This doesn't appear in the paper. I'm pretty sure these numbers are correct, but I'm not 100% certain. Please feel free to mention in the comments if I got something wrong, but I think this is right.

If you target scores above zero, you are targeting everyone, because everyone has a score above zero. That means whatever your intervention is must take place at the point of sale. You calculate the risk at the point of sale and you target people who are above the threshold. This is 100% of gun purchasers, and 100% of the people who will go on to commit suicide are in that group because it's everybody. Your positive predictive value (PPV), which is the percentage of people that you are targeting who will go on to commit suicide, is only 0.07% because that's the suicide rate in the entire population of gun owners. What could your intervention be? What would you do here? You can't do much because this is literally everyone. It would be very expensive to do anything intensive on everyone, and I'm sure people would consider it a violation of their rights. You could give out pamphlets at the point of gun sale with information about suicide, hotlines, etc. You are quite limited.

As you increase the score threshold to intervene, your options become more interesting. For example, at a score threshold of 0.38, you are only targeting 29% of gun purchasers for this intervention, and 70% will just walk out the door. We don't think they are very high risk, but you would still capture 75% of the suicide among the remaining gun purchasers. Your PPV has gone up to 0.2%. Maybe now you can start thinking about some kind of outreach, such as emailing to check in on people over time — something low-cost.

As we ramp this number up to scores above 0.5, now you would only be intervening on 10% of gun purchasers. You're still getting 50% of all the suicides at that level. So you're going to miss people. But you are targeting a smaller group so you can do more. The PPV goes up to 0.3%. Maybe a longer waiting period, for example. Oh, sorry. You're in the top 10% of suicide risk. You have to wait a couple of months. Something like that. I'm not a lawyer. This is not about the Constitution. I'm merely trying to explain how this works from a public health standpoint.

A little bit higher (scores above 0.57), you would target only 5% of gun purchasers. Now you're getting 38.6% of suicides. At that point, perhaps they need some kind of evaluation or check-in to see how they are doing. You might need some type of professional to say that, yes, this is fine.

And just to point out, if you went really high to scores above 0.9, almost no one buying a gun is going to meet that threshold. It's only 0.1% of gun purchasers. Of course, you would miss a bunch of the suicides, but you would capture 4.4% of suicides. And in fact, 14.5% of everyone you intervened on would go on to die by suicide. This is a very high PPV. This is the point where you start to wonder: At this level, do we say 14.5% of people are going to go on to commit suicide with this gun? Maybe you don't give them a gun. Of course, 85% of these people aren't going to commit suicide with this gun. These are difficult issues.

A major limitation of the study: All of these guns were legally transferred from one person to another, at a gun dealer or a gun show. They captured all of these in California, but illegally obtained and transferred guns are not going to appear in this dataset. In fact, of all the gun suicides in California over this time period, only about 38% of the weapons had appeared in the registry. So 60% of the suicides occurred weren't analyzed here because no gun purchase could be documented. That means they either got the gun from someone else or didn't register the gun — something like that. So there are limitations here.

What's really interesting about this study is when you think about how artificial intelligence is going to move forward. Right now, at that very high end we could potentially target those among whom 15% are going to commit suicide within a year. You might say, "I'm not going to sacrifice the rights of the 85% of people who don't commit suicide just to protect the 15% of people who do." But at some point, these machine-learning algorithms are going to get better. As more data are collected, we will know more about these people, perhaps integrating social media posts or biometrics at the point of sale. What if we had a model that was 100% accurate at some threshold? With a score above X, we are 99.9% sure that this individual is going to commit suicide within the next year. What do you do with that information? Can you do anything? Do you have to do something?

We're not there yet. Right now this is sort of a cognitive exercise and it echoes the kind of concerns about living in a society where you can arrest someone before they've done anything wrong. But death by suicide is a bit different. And because of the incredible efficacy of guns in this space and the fact that the majority of gun deaths in this country are deaths by suicide, studies like this that start to give us insight into who is at risk are going to play an ever larger role in this discussion. I'm not sure what's going to happen. I'm not sure whether it's going to get shut down and it will be illegal to make predictions about people at the point of gun sale. That's a possibility. But is there room at some level of risk to mandate some kind of evaluation, at least before people can purchase a gun? Will we save lives that way? That's for the future. If we get some studies like that, we'll certainly talk about it here.

Thanks for joining me for this unusual episode of Impact Factor. If you like this format, let me know. Otherwise, next week we'll be back in the studio and with our regularly scheduled programming.

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.