Use of Unstructured Event-based Reports for Global Infectious Disease Surveillance

Mikaela Keller; Michael Blench; Herman Tolentino; Clark C. Freifeld; Kenneth D. Mandl; Abla Mawudeku; Gunther Eysenbach; John S. Brownstein


Emerging Infectious Diseases. 2009;15(5) 

In This Article

The GPHIN Project


GPHIN took early advantage of advancements in communication technologies to provide coordinated, near real-time, multisource, and multilingual information for monitoring emerging public health events.[20,21] In 1997, a prototype GPHIN system was developed in a partnership between the government of Canada and WHO. The objective was to determine the feasibility and effectiveness of using news media sources to continuously gather information about possible disease outbreaks worldwide and to rapidly alert international bodies of such events. The sources included websites, news wires, and local and national newspapers retrieved through news aggregators in English and French. After the outbreak of severe acute respiratory syndrome (SARS), a new, robust, multilingual GPHIN system was developed and was launched November 17, 2004, at the United Nations.

Data Acquisition

Automated Process The GPHIN software application retrieves relevant articles every 15 minutes (24 hours/day, 7 days/week) from news-feed aggregators (Al Bawaba [] and Factiva []) according to established search queries that are updated regularly. The matching articles are automatically categorized into >1 GPHIN taxonomy categories, which cover the following topics: animal, human, or plant diseases; biologics; natural disasters; chemical incidents; radiologic incidents; and unsafe products.

Articles with a high relevancy score are automatically published on the GPHIN database. The GPHIN database is also augmented with articles obtained manually from open-access web sites. Each day, GPHIN handles ≈4,000 articles. This number drastically increases when events with serious public health implications, such as the finding of melamine in various foods worldwide, are reported.

Human Analysis Process Although the GPHIN computerized processes are essential for the management of information about health threats worldwide, the linguistic, interpretive, and analytical expertise of the GPHIN analysts makes the system successful. Articles with relevancy below the "publish" threshold are presented to a GPHIN analyst, who reviews the article and decides whether to publish it, issue an alert, or dismiss it. Additionally, the GPHIN analyst team conducts more in-depth tasks, including linking events in different regions, identifying trends, and assessing the health risks to populations around the world.

Data Dissemination

Machine Translation English articles are machine-translated into Arabic, Chinese (simplified and traditional), Farsi, French, Russian, Portuguese, and Spanish. Non-English articles are machine-translated into English. GPHIN has adopted a best-of-breed approach in selecting engines for machine translation. The lexicons associated with the engines are constantly being improved to enhance the quality of the output. As such, the machine-translated outputs are edited by the appropriate GPHIN analysts. The goal is not to obtain a perfect translation but to ensure comprehensibility of the essence of the article.

Information Access Users can view the latest list of published articles or query the database by using both Boolean and translingual metadata search capabilities. In addition, notifications about events that might have serious public health consequences are immediately sent by email to users in the form of an alert.

Project Results

As an initial assessment of data collected during July 1998 through August 2001, WHO retrospectively verified 578 outbreaks, of which 56% were initially picked up and disseminated by GPHIN.[9] Outbreaks were reported in 132 countries, demonstrating GPHIN's capacity to monitor events occurring worldwide, despite the limitation of predominantly English (with some French) media sources.

One of GPHIN's earliest achievements occurred in December 1998, when the system was the first to provide preliminary information to the public health community about a new strain of influenza in northern People's Republic of China.[20] During the SARS outbreak, declared by WHO in March 2003, the GPHIN prototype demonstrated its potential as an early-warning system by detecting and informing the appropriate authorities (e.g., WHO, Public Health Agency of Canada) of an unusual respiratory illness outbreak occurring in Guangdong Province, China, as early as November 27, 2002. GPHIN was further able to continuously monitor and provide information about the number of suspected and probable SARS cases reported worldwide on a near real-time basis. GPHIN's information was ≈2–3 days ahead of the official WHO report of confirmed and probable cases worldwide.

In addition to outbreak reporting, GPHIN has also provided information that enabled public health officials to track global effects of the outbreak such as worldwide prevention and control measures, concerns of the general public, and economic or political effects. GPHIN is used daily by organizations such as WHO, the US Centers for Disease Control and Prevention (CDC), and the UN Food and Agricultural Organization.