Using Big Data to Monitor the Introduction and Spread of Chikungunya, Europe, 2017

Joacim Rocklöv; Yesim Tozan; Aditya Ramadona; Maquines O. Sewe; Bertrand Sudre; Jon Garrido; Chiara Bellegarde de Saint Lary; Wolfgang Lohr; Jan C. Semenza


Emerging Infectious Diseases. 2019;25(6):1041-1049. 

In This Article

Abstract and Introduction


With regard to fully harvesting the potential of big data, public health lags behind other fields. To determine this potential, we applied big data (air passenger volume from international areas with active chikungunya transmission, Twitter data, and vectorial capacity estimates of Aedes albopictus mosquitoes) to the 2017 chikungunya outbreaks in Europe to assess the risks for virus transmission, virus importation, and short-range dispersion from the outbreak foci. We found that indicators based on voluminous and velocious data can help identify virus dispersion from outbreak foci and that vector abundance and vectorial capacity estimates can provide information on local climate suitability for mosquitoborne outbreaks. In contrast, more established indicators based on Wikipedia and Google Trends search strings were less timely. We found that a combination of novel and disparate datasets can be used in real time to prevent and control emerging and reemerging infectious diseases.


Many sectors of society have taken full advantage of new opportunities provided by big data, but public health has not.[1] Although electronic health records have long been used in surveillance, novel applications of big data are rare. Internet search query data from Google or Wikipedia have been applied to anticipate influenza epidemics but are hampered by several limitations, including specificity and granularity.[2–4] More recently, crowdsourcing of symptoms through emails, text messages, or tweets has been explored, and outbreaks have been tracked by scanning high-volume surveillance systems.[5,6] However, when it comes to fully harvesting the potential of big data, public health still lags behind other fields. Using chikungunya as a case study, we illustrate how big data can help tackle emerging infectious diseases through prevention, detection, and response.

A key driver of the emergence and spread of vectorborne diseases is human mobility,[7–10] yet little is known about the epidemiologic consequences of mobility patterns at different spatial scales within the context of vectorborne diseases. A main obstacle to studying the complex interactions between human hosts, pathogens, and vectors has been the limited availability of spatiotemporal datasets for analyzing human mobility patterns. Prior research relied on low-resolution mobile phone records, such as call and messaging logs from mobile phone networks,[11–13] for which biases were notable.[14,15] Furthermore, use of mobile phone data for tracking human mobility is likely to be fraught with privacy concerns and data access restrictions.[15]

Recently, social media has emerged as an alternative source of real-time, high-resolution geospatial data on a large scale.[1,15] Use of this unique aspect of publicly available social media data to study the human dimensions of the introduction and spread of emerging infectious diseases has not been explored to its fullest extent. In areas where risk for virus importation and onward transmission is heightened, such knowledge can inform outbreak preparedness and response planning by pinpointing receptive areas where proactive countermeasures should be implemented in a timely fashion.[16,17]

The impediments to using big data in public health are not only the size of the databases but also the complexity of their processing. The challenges include 3 main dimensions: volume, velocity, and variety.[18–20] Volume calls for statistical sampling; velocity, for instant access to near real-time transaction data; and variety, for management of nonaligned data structures. We illustrate how big data can be used to monitor the introduction and spread of the 2017 chikungunya outbreak in Europe by tackling these challenges.[18–20]

To assess risk for virus importation from international areas with active chikungunya transmission, we extracted air passenger volume from large-scale aviation data. To quantify the risk for short-range dispersion (defined as the potential for onward transmission and spread of chikungunya virus from the initial outbreak foci to other areas during transmission season), we used a mining algorithm to process quasi–real-time, geolocated Twitter activity data and computed mobility patterns of users. We have previously shown that mobility data from Twitter users is predictive of disease spread.[21] We then estimated the seasonal vectorial capacity of Aedes albopictusmosquitoes to transmit chikungunya virus and linked it with human mobility patterns. We further complemented these data with Internet and information search activities related to chikungunya infection, vectors, and clinical signs and symptoms collected from Wikipedia and Google Trends. Last, we estimated the empirical basic reproduction number (R0) from the outbreaks and compared these numbers with our model predictions of epidemic potential based on climate conditions. More detail on our methods in Appendix 1.