Estimating the Secondary Attack Rate and Serial Interval of Influenza-like Illnesses Using Social Media

Elad Yom-Tov; Ingemar Johansson-Cox; Vasileios Lampos; Andrew C. Hayward


Influenza Resp Viruses. 2015;9(4):191-199. 

In This Article

Abstract and Introduction


Objectives Knowledge of the secondary attack rate (SAR) and serial interval (SI) of influenza is important for assessing the severity of seasonal epidemics of the virus. To date, such estimates have required extensive surveys of target populations. Here, we propose a method for estimating the intrafamily SAR and SI from postings on the Twitter social network. This estimate is derived from a large number of people reporting ILI symptoms in them and\or their immediate family members.

Design We analyze data from the 2012–2013 and the 2013–2014 influenza seasons in England and find that increases in the estimated SAR precede increases in ILI rates reported by physicians.

Results We hypothesize that observed variations in the peak value of SAR are related to the appearance of specific strains of the virus and demonstrate this by comparing the changes in SAR values over time in relation to known virology. In addition, we estimate SI (the average time between cases) as 2·41 days for 2012 and 2·48 days for 2013.

Conclusions The proposed method can assist health authorities by providing near-real-time estimation of SAR and SI, and especially in alerting to sudden increases thereof.


Understanding the transmission dynamics of influenza and influenza-like illnesses (ILI) is vital for deciding on public health strategies to reduce the impact of the virus. An important parameter in the spread of pandemic diseases is their secondary attack rate (SAR): the probability that infection occurs among susceptible persons within a reasonable incubation period following known contact with an infectious person or an infectious source.[1] A further important parameter for influenza transmission models widely used to design control measures is the serial interval (SI): the time between symptom onset of a primary case and symptom onset of its secondary cases.

Collecting the necessary data for computing SAR and SI entails the tracking of relatively large populations or identification of cases and follow-up of their contacts and is compounded by the fact that the majority of people suffering from influenza do not seek medical attention. For example, only 17% of laboratory-confirmed cases in a large community cohort in England sought medical attention.[2] Thus, periodic surveys are sometimes employed for data collection,[2] although these require a large effort by researchers or health authorities and the public completing them.

Public health bodies monitor influenza based on those who seek medical attention, but this surveillance provides no direct information on transmissibility. Some countries also plan more detailed ascertainment of cases and their contacts during pandemics but even these studies may have difficulty in estimating secondary attack rates within households because identified cases and their contacts are likely to receive antivirals.[3] Selection bias is inherent in outbreak investigations which may also overestimate transmissibility.[4] A mechanism to routinely monitor an indicator of influenza transmissibility, such as the SAR, and of SI using standardized methodology that could be used on an international scale would therefore be an important tool to guide pandemic response.[5]

Behavioral data from the Internet in general, and social media in particular, are known to correlate well with various health behaviors. The severity of influenza was tracked using search engines,[6] advertisements[7] and social media.[8] Although the accuracy of the first of these has been criticized,[9] partially for its sensitivity to media attention to seasonal flu, it remains an inexpensive and near-real-time tool for monitoring influenza load across multiple geographies.

Tracking influenza load through Internet activities provides a more sensitive sensor than that afforded by hospitalizations and doctor visits because it serves as a window into people's health concerns even when these do not warrant a visit to medical facilities. They are also advantageous over surveys because they can be collected with a much smaller effort. The drawbacks of these data are that they cannot be directly verified (e.g., using genetic testing for the specific strain of the virus from which a person is suffering), use ambiguous language, and that people sometimes overdiagnose themselves.[10] Here, we study a specific type of SAR known as the familial or household secondary attack rate (fSAR). fSAR is defined as the probability that at least one household contact becomes a secondary case given that one of the family members was infected.[5] We estimate fSAR by observing reports of influenza-like illness and its symptoms in social media, and whether they pertain to the reporting user themselves or to their immediate family members.