Using Big Data to Monitor the Introduction and Spread of Chikungunya, Europe, 2017

Joacim Rocklöv; Yesim Tozan; Aditya Ramadona; Maquines O. Sewe; Bertrand Sudre; Jon Garrido; Chiara Bellegarde de Saint Lary; Wolfgang Lohr; Jan C. Semenza


Emerging Infectious Diseases. 2019;25(6):1041-1049. 

In This Article

Big Data and Emerging Infectious Diseases

In light of the arrival and explosive expansion of chikungunya in the Americas in 2013 through Ae. aegypti moquitoes,[24] big data offer the opportunity to monitor the introduction and spread of chikungunya in Europe. An outbreak can be divided, broadly speaking, into 2 distinct phases. The first phase is importation of the virus via a viremic person into a virus-naive population. For this phase, we used big data (volume) to estimate air passenger-journeys from areas with active chikungunya transmission as a measure of the force of introduction of the virus into the outbreak zones in Europe. To identify areas with onward transmission risk, we also considered the volume of air passengers leaving these outbreak zones. For the second phase, the establishment of autochthonous transmission in Europe is a function of virus importation, population density, vector activity, climate conditions, exposure patterns, and several other factors that are more difficult to quantify.[17] Our study addressed some of these epidemiologic challenges by using big data. Rather than a Twitter content analysis, which has been performed for several outbreaks,[25–28] we used near–real-time geocoded Twitter data (velocity) to quantify human mobility patterns and disentangled connectivity between populations. Mobility estimates also reflect population density and indirectly take into account exposure patterns because such populations on the move are occasionally susceptible to exposure and are also a source of exposure. The ecology of the virus and the human-vector transmission cycle were captured by vectorial capacity (variety), which quantified transmission risk on the basis of climate conditions. Thus, we were able to quantify the trajectory of an arbovirus outbreak by dissecting and better understanding its phases.

Our analysis of big data revealed distinct mobility patterns between the outbreak zones in France and Italy, between Rome and Anzio, and between Rome and most of the local outbreak clusters in Italy. However, the potential effects of these mobility patterns on local spread need to be confirmed epidemiologically by phylogenetic analyses. Although the sensitivity of our risk maps based on mobility and climate data to identify areas at risk for virus spread was good, the specificity needs to be further improved, for example, by including local contextual factors such as land use and vector activity. Wikipedia page hits and Google Trends have been proposed as resources for disease surveillance and outbreak detection. However, our analysis demonstrates that these sources seemed to mainly indicate public awareness of the chikungunya outbreaks as they peaked. For such reasons, they seem to be of little use for early response.

The combination of short-distance air passenger-journeys (within Europe, as opposed to overseas) and geocoded Twitter data lends itself to cross-validation. We found that the 2 approaches consistently identified several cities with established vector populations at a heightened risk for virus importation, reflecting the potential for spread between countries and cities in Europe. Some of these regions had previously encountered autochthonous transmission.[29]

The R0 estimates, which were derived by using epidemiologic data, were in accordance with the vectorial capacity predictions for the outbreak zones based on local climate conditions. Based on the vectorial capacity, R0 can be derived by multiplication with the infectious period. For chikungunya, an infectious period of 3–7 days was reported.[30] The vectorial capacity of ≈0.7 would give rise to an R0 of ≈2–3. This range is within that which we observed in the Rome and Anzio regions in July and August, but the vectorial capacity was estimated to be higher (≈0.8) in the Calabria region, translating into an R0 of just over 3–4, which is in agreement with the epidemiologic analysis of the outbreak data (Figure 2).

Although our mobility analysis showed that the local mobility from Var was considerable, no autochthonous chikungunya cases were reported from other identified risk regions along the Mediterranean coast of France and in northern Spain. However, the vectorial capacity of Ae. albopictus mosquitoes to transmit the virus is lower in Var than in Lazio, which may explain this discrepancy. Previous studies assessing the risk for local outbreaks after outbreaks outside of Europe found that inbound flight traveler frequencies correlated strikingly well with local reports of virus importation frequencies into Europe.[9] However, most of these studies evaluated these risks independently and did not attempt to estimate the combined risk for virus importation and climate suitability.[31,32] Moreover, they did not assess local dispersion patterns from airports or outbreak areas. We analyzed big data for long- and short-distance mobility. A major strength of this big data approach is the near real-time availability of mobility patterns based on social media, which are timelier and more accessible and less costly than air passenger data available from commercial providers, such as the IATA. This approach can identify areas of heightened mobility that are potentially at risk for onward transmission, as we have shown in this analysis. Geocoded Twitter data can be a good proxy for human mobility,[15] but prior research did not explore how such data can be a timely resource for preparedness and response to infectious disease outbreaks.

Similar to others who have used IATA and Twitter data in their studies, we found these novel data sources to be reliable and useful. However, we note that Twitter data can potentially be biased because Twitter users may represent a select population whose mobility patterns differ from those of the general population; more specifically, they represent a population of Twitter users who have allowed Twitter to follow their geolocations. Future studies need to validate the use of social media data in such applications. These methods are an improvement over mobile telephone tracking data because they do not rely on a single provider network and are a less costly data source to acquire.

Seasonal weather forecasts may have provided better input into the assessment of vectorial capacity, specifically for the fall of 2017. Moreover, autochthonous transmission risk may also be related to local proliferation of vectors and local environmental, social, and behavioral characteristics, such as awareness about the symptoms of chikungunya (Appendix 3). Such factors have been found to be associated with the local transmission risk for dengue.[33] Last, because of the paucity and underreporting of chikungunya cases, we may have potentially underestimated the passenger volume from active transmission areas in Africa.

This study illustrates the potential value of using big data[18–20] to pinpoint areas at risk for the introduction and dispersion of emerging infectious diseases. The analysis identified that the areas at greatest risk were those in close proximity to the original outbreaks and several larger metropolitan areas. The trajectory and sustained spread of emerging infectious diseases can be anticipated with predictive modeling in realtime. This study suggests that big data can be an indispensable tool for the prevention and control of emerging infectious diseases.