Early Childhood Gut Microbiomes Show Strong Geographic Differences Among Subjects at High Risk for Type 1 Diabetes

Kaisa M. Kemppainen; Alexandria N. Ardissone; Austin G. Davis-Richardson; Jennie R. Fagen; Kelsey A. Gano; Luis G. Léon-Novelo; Kendra Vehik; George Casella; Olli Simell; Anette G. Ziegler; Marian J. Rewers; Åke Lernmark; William Hagopian; Jin-Xiong She; Jeffrey P. Krischer; Beena Akolkar; Desmond A. Schatz; Mark A. Atkinson; Eric W. Triplett


Diabetes Care. 2015;38(2):329-332. 

In This Article

Research Design and Methods

The TEDDY study prospectively observes children at six clinical centers in Europe (Finland, Sweden, and Germany) and the U.S. (Colorado, Washington state, and Georgia/Florida).[1] A total of 1,129 stool samples from 90 children, 15 from each study site, were collected monthly starting, on average, at 151.1 days after birth (SE 5.5 days after birth) until the average last sampling at 537 days after birth (SE 4.5 days after birth). Samples were collected at home and mailed to a TEDDY repository within 72 h, with ice packs during the summer months.[1,12] Fecal sample storage at room temperature for up to 72 h does not affect bacterial composition by >10%.[13] Subjects were determined to have the highest risk HLA class II genotype (DR4-DQA1*030×-DQB1*0302/DR3-DQA1*0501-DQB1*0201) by genotyping of cord blood,[1] but neither autoantibodies nor disease developed during the sample collection period. Clinical data were collected on gestational age, delivery mode, sex, and early feeding practices (age at first introduction to formula, and duration of exclusive and any breastfeeding), and later on diet (age at first introduction to oats, gluten, milk products, cow milk, and solid food).[1]

DNA was isolated from frozen stool samples as previously described.[13] Extracted DNA was purified using the PowerClean DNA Kit (MO BIO Laboratories, Inc., Carlsbad, CA). 16S rRNA amplification, sequencing using a barcoded Illumina approach, sequence analysis, read trimming, and taxonomic classification were performed as previously described.[14]

Samples with <10,000 reads and any operational taxonomic units with <50 reads in at least one sample were removed from the data set. This resulted in an average of 102,147 reads per sample (SE 1,151 reads per sample), of which on average 43.9% (SE 0.1%) were successfully classified at the genus level. The relative abundances of bacterial genera were calculated as the percentage of classified reads. Sequences that did not map to known genera were clustered to each other at 95% similarity. The original sequences were submitted to MG-RAST under project identification #3229. The bacterial diversity of each sample was determined by calculating the Shannon diversity index (SDI).

Data analysis was performed using R statistical software version 3.0.0[15] or SAS version 9.3 (SAS Institute Inc., Cary, NC). Demographic, clinical, and dietary variables were assessed by site. Categorical variables were analyzed using Pearson χ2 test or Fisher exact test. Continuous variables were tested using the one-way ANOVA or Kruskal-Wallis test for differences in means. Generalized estimating equations for longitudinal correlated data were used to assess the association between geographical location and bacterial abundance and diversity adjusting for demographic, clinical, and dietary variables. Separate models were examined for each bacterial genus under study. A permutation test and the F statistic were used to determine whether SDI differed among the six TEDDY study sites, as previously described.[16] P values <0.05 were considered significant.