Comparing the Quality of Pro- and Anti-vaccination Online Information

A Content Analysis of Vaccination-Related Webpages

Gabriele Sak; Nicola Diviani; Ahmed Allam; Peter J. Schulz


BMC Public Health. 2016;16(38) 

In This Article


Source of Data (Webpages)

In order to reach our aim, an adequate and comparable number of both pro- and anti-vaccination webpages was needed. Our objective was therefore not to get a representative sample of vaccination-related webpages, but to be able to systematically analyze a large and equal number of pro- and anti-vaccination online sources, to ultimately compare their quality. For this purpose, a pre-determined number of vaccination-related webpages (N = 1394) was obtained from two research studies conducted at the Institute of Communication and Health (ICH) at the Università della Svizzera italiana (USI), based in Lugano, Switzerland.[22] These studies had the main objective to investigate user's knowledge and beliefs toward immunizations after having been exposed to vaccination-related content (i.e., 10 min online session). The independent category was exposure to different combinations of HONcode-certified websites of high quality and webpages with anti-vaccination content. Anti-vaccination sites were retrieved via a customization1 of the Google search engine, using keywords such as "vaccination and autism"; "vaccination side effects"; "anti-vaccination movement". In order to assure traceability of the data set obtained, and ultimately conduct further investigations, the researcher archived the URLs of all the webpages processed in the studies (N = 1394). All webpages retrieved for these two studies (i.e., pilot study and experiment) formed the sample for the present content analysis. Webpages were accessed and reviewed between 15 October 2013 and 15 December 2013.

Exclusion Criteria

Webpages were excluded from analyses if they were duplicates (N = 19), not written in English (N = 0), not anymore available or retrievable (N = 40), if they redirected to other web sources (e.g., index pages providing links to articles or news; N = 100) or to a URL other than the one originally shown (N = 5), if they did not or only marginally treat the topic of vaccination (N = 28), had an insufficient amount text to evaluate (N = 38), or were delivered via .pdf or similar formats (N = 71). Pdf or similar formats were not considered for the present study due to their static nature. This amounted to 300 discarded webpages, which left 1093 webpages for analysis. The exclusion rate of 21.5 % was comparable to those of past studies of the field.[23–25]

Coding Instruments

In light of the lack of content analyses focused on quality in the past, a new tool was developed specifically for the purpose of the present study. The tool consisted of three related coding instruments: 1) Online Vaccination Information Quality (OVIQ) codebook; 2) OVIQ code-sheet; and 3) OVIQ checklist. The checklist was designed to simplify raters' coding efforts by providing a clear and comprehensive graphical view of the entire set of quality indicators and relative values. However, especially in the initial phase of the webpage evaluation, it was highly important that coders understand properly the coding rules stated only in the codebook. The final version of the quality assessment instrument included 40 categories, mostly having a dichotomous value (i.e., 0 = not available/not stated/not detected, and 1 = available/stated/detected). The other categories either were of qualitative nature (e.g., Ease of Use, Functioning of Links) or again quantitative, but with a further degree of specification (e.g., Type of Information, Bar Menu).

Information Quality Categories

The information quality categories included in the coding scheme were derived from relevant literature pertaining to general health information quality and from research conducted on vaccination information.[3,5,7,10,12–14,19,21,23,26–48] Additionally, guidelines developed in the context of several online health information quality initiatives were retrieved through the support of the academic article written by Wilson,[8] and considered for the present investigation (The Health On the Net Foundation, HONcode);[49] URAC2; Netscoring3; eHealth Code of Ethics4; Web Medica Acreditata;[50] and Standford Persuasive Tech Lab[51]).

All relevant categories were segmented into design and content attributes. Design quality attributes incorporated both criteria considered as fundamental when analyzing webpages in general, irrespective of their subject (i.e., web-related design quality criteria, first 10 categories), and criteria that are more indicative of the quality status of online health resources (3 categories). Content quality attributes were subsequently divided into health-related content attributes pertaining to general health information (12 categories), and vaccination-specific content attributes (15 categories). Figure 1 provides a visual representation of the systems and subsets of categories developed for this study.

Figure 1.

Systems and sub-sets of categories

After redundant categories were discarded, the final codebook included 40 categories as listed in (see Additional file 1: Table S1

Raters and Reliability

Two coders carried out the coding process: the first author of the study (GS) and an undergraduate communication scholar familiar with content analysis and trained in applying the coding system (two training sessions of about 2 h each). In a pilot test phase both raters independently applied the codebook to 20 webpages randomly selected from the full sample. This phase was completed without any major glitches (except the need to better specify a few coding rules), and the majority of categories were considered as comprehensible and easily applicable for both raters. A formal reliability assessment phase was then conducted. As results were satisfactory, the undergraduate rater was employed to evaluate additional 150 vaccination-related webpages of the initial sample (N = 1394).

The reliability index applied was Cohen's Kappa[52] because it is conservative and accounts for chance agreement,[53] and because all the relevant categories of the coding instrument had a nominal status. Implementing the recommendations provided by Lombard and colleagues,[53] the minimum acceptable level of agreement was set at .60.

For testing inter-coder reliability, 100 webpages were randomly selected from the initial sample of 1394 web-links.

As displayed in Additional file 1: Table S1, almost all categories had moderate to perfect agreement levels. Among the quality attributes pertaining to the design macro section, the specific category Ease of Use (navigability) had to be excluded from the computation of indices due to its low level of agreement (k = .45). Also excluded were the target audience sub-option caregivers (k = .57) from the specific section health-related content attributes, and the category risk of not getting vaccinated (k = .33) from the vaccination-specific section. The entire coding instrument (OVIQC) had a high level of agreement (k = .89).

Data Processing and Analysis

The independent category of this study was the general tone of the webpage, which could be pro-vaccination, anti-vaccination, or neutral. General tone was measured as a global assessment of the website's position in the vaccination controversy. The major dependent category was the presence or absence of the different quality indicators. Categories created to highlight content features that are peculiar either to pro-vaccination online pages only or exclusively for the opposite anti-vaccination contents are not used for comparison purposes. For instance the category labeled as how to get vaccination exemptions legally was clearly designed to evaluate contents opposed to the vaccination practice.

The following indices were computed:

  1. Webpage Design Index was computed from three of the ten original Web-related design quality criteria, plus the three from the Health-specific design quality criteria, and ranged from 0 to 6.

  2. Interactivity Index was computed from two of the ten original Web-related design quality criteria, and ranged from 0 to 6.

  3. Health-Related Content Quality Index was computed from eleven of the twelve original Health-related content attributes, plus two from the original Web-related design quality criteria, and ranged from 0 to 13.

  4. Vaccination-Specific Content Index was computed from five of the fifteen original Vaccination-specific content attributes, and ranged from 0 to 9.

Table 1 shows the four different indices along with the categories that make them up. The four indices were summed up to form a Total Aggregated Quality Index, ranging from 0 to 34. Based on this index, webpages were classified as poor quality (<15 point), medium quality (15–22 points), and high quality (>22 points).

Data analysis included frequency counts, cross tabulations, Pearson's chi-square, and other inferential indicators.