Age and Fecal Microbial Strain-Specific Differences in Patients With Spondyloarthritis

Matthew L. Stoll; Pamela F. Weiss; Jennifer E. Weiss; Peter A. Nigrovic; Barbara S. Edelheit; S. Lou Bridges Jr; Maria I. Danila; Charles H. Spencer; Marilynn G. Punaro; Kenneth Schikler; Andreas Reiff; Ranjit Kumar; Randy Q. Cron; Casey D. Morrow; Elliot J. Lefkowitz


Arthritis Res Ther. 2018;20(14) 

In This Article



Pediatric SpA subjects were mostly children with enthesitis-related arthritis (ERA; juvenile SpA) as per the International League of Association for Rheumatology criteria;[20] three with sacroiliitis in the absence of peripheral arthritis met the Assessment of Spondyloarthritis International Society (ASAS) criteria for axial SpA.[21] Pediatric controls were either healthy children recruited through advertisements or children referred to rheumatology for evaluation of arthritis but found to have noninflammatory causes of joint pain or irrelevant laboratory markers, such as a positive antinuclear antibody. None of the controls had any findings suggestive of infectious arthropathies, including Lyme synovitis. Pediatric subjects were recruited from eight sites around the country (Table 1). Exclusion criteria were receipt of antibiotics within 3 months prior to study enrollment and prior treatment with any immunosuppressive agent excluding less than 2 weeks of corticosteroids. None of the pediatric subjects or controls had been included in our prior work.[8]

Adult SpA subjects were all recruited at the University of Alabama at Birmingham (UAB). Patients met either the ASAS criteria for peripheral[22] or axial[21] SpA or the Classification Criteria for Psoriatic Arthritis (CASPAR) criteria[23] for psoriatic arthritis (PsA). Sacroiliitis in both children and adults was defined based upon either clinical findings per the judgment of the treating rheumatologist or imaging findings on MRI or radiography; only a single observer was required. Adult controls were healthy volunteers recruited through advertisements or patients with noninflammatory causes of joint pain (Table 2). For the adult subjects, as with the pediatric counterparts, recent antibiotic use was an exclusion criterion; however, for the adult subjects, immunosuppressive therapy was not.

Processing of Fecal Samples

Subjects collected the samples at home and immediately placed the samples in a 50-ml container filled with Cary-Blair media. Samples were shipped overnight via commercial carrier to the microbiota core at UAB. Microbial genomic DNA was isolated by standard methods using kits from Zymo Research (Irvine, CA, USA) as per the manufacturer's instructions. For 16S sequencing, the Zymo ZR Fecal DNA MiniPrep kit (catalog # D6010) was used; for shotgun metagenomics, the whole genome DNA kit (catalog # D6110) was used.

Sequencing and Analysis of 16S Ribosomal DNA From the Fecal Specimens

The purified DNA (~100 ng) underwent PCR amplification using primers designed to the conserved region flanking the V4 region from the 16S ribosomal DNA (rDNA) gene, as described previously.[8] Resulting PCR fragments were run on the Illumina MiSeq (San Diego, CA, USA) at a concentration of 12 pM; read lengths were approximately 250-bp paired-end reads. Initial steps of the analysis were performed with the Divisive Amplicon Denoising Algorithm (DADA2).[24,25] Unlike some of the widely used algorithms,[26,27] DADA2 does not cluster similar sequences. Instead, it uses error modeling to distinguish amplification or sequencing errors from true sequences, which are typically referred to as sequence variants (SVs). This assessment is based upon features of the sequence, specifically the distance from its nearest neighbor as measured by the number of nucleotide differences and as influenced by quality scores and specific nucleotide transitions; and its abundance. Thus, a highly abundant sequence may be assessed as a true SV rather than a sequence error, even if it is highly similar to another abundant SV. Consequently, DADA2 can identify unique strains within a species. The output of DADA2 is an abundance table, in which each unique sequence is characterized by its abundance in each sample. Taxonomic information was incorporated into this abundance table with the ribosomal database project naïve Bayesian classifier[28] using the May 2013 version of Greengenes.[29] This abundance table has the same structure as a traditional operational taxonomic unit table that is the analytic unit in the Quantitative Insight Into Microbial Ecology (QIIME) tool suite.[26] Thus, it was incorporated into QIIME with the biom convert script, and assessment of alpha and beta diversity was performed within QIIME. Alpha diversity indexes of richness and evenness were assessed with the Chao1 and Shannon measures respectively; beta diversity was assessed with the Bray Curtis measure, visualized using principal coordinates analysis (PCoA). Variables assessed were presence versus absence of arthritis, and among the ERA patients were geographic location, presence versus absence of sacroiliitis, and HLA-B27 status. Geographic location was assessed both by looking at each site as a separate unit and by pooling into one region the four sites located in the northeast (Boston Children's Hospital, Connecticut Children's Medical Center, Hackensack University Medical Center, and Children's Hospital of Philadelphia), a geographic area that is similar in size to the catchment area of UAB. To identify specific strains within F. prausnitzii, the 16S sequences were downloaded from the National Library of Medicine, and the SVs matching F. prausnitzii were aligned against the known sequences using BLAST. The code used for the 16S sequence analysis is available in Additional File 1.

Sequencing and Analysis of Shotgun Sequencing From the Fecal Specimens

The purified DNA was sheared and 500–600-bp fragments were purified by agarose gel electrophoresis. Illumina kits were used for adapter ligation followed by Illumina HiSeq, 125-bp paired-end sequencing. Approximately 25–30 million reads per sample were obtained from each sample. A second set was run on the Illumina MiSeq, yielding 3–7 million reads per sample, approximately 250-bp paired-end sequencing. Quality control of both sets was performed using PRINSEQ.[30] Alignment with the human genome to remove human reads was performed with Bowtie2[31] using the very-sensitive mode. Conversion of fastq to fasta files was performed with the FASTX tool kit ( The remainder of the analysis was performed with the HMP Unified Metabolic Analysis Network (HUMAnN V2) program.[32] A detailed description of the program is available online ( Briefly, the input fasta files were aligned against a functionally annotated pan-genome database of over 4000 species, named ChocoPhlan by the HUMAnN2 investigators, using Bowtie2.[31] Reads not mapped to ChocoPhlan were subsequently aligned against the UniProt universal proteins database[33] using double index alignment of next-generation sequencing data (DIAMOND),[34] a more efficient alternative to BLASTX. Pathway assignments were performed with MetaCyc;[35] all genes associated with a pathway must be present for the pathway to be considered a match. Gene and pathway abundances were normalized to the sequencing depth prior to any further analyses. Translation from UniProt to Kyoto Encyclopedia of Genes and Genomes (KEGG)[36] was performed with the map_ko_uniref50.txt.gz mapping file made available by the HUMAnN investigators. The output of HUMAnN2 is files containing the abundance of each gene as well as of each pathway, optionally normalized to the sequencing depth of each subject. The code used for the HUMAnN2 analysis is included in Additional File 1.

Statistical Analysis

To evaluate whether the samples as a whole clustered according to nominal variables, the permutation multivariate analysis of variance (PERMANOVA) test was run against the distance matrix generated from the Bray Curtis test of beta diversity. The PERMANOVA test partitions a distance matrix among sources of variation (e.g., presence versus absence of arthritis) in order to describe the strength and significance that a categorical variable has in determining the variation of distances.[37] Comparisons in the abundance of specific taxa, as well as assessments of differences in alpha diversity, were performed with Student's t test. Pairwise comparisons performed at each of the phylogenetic levels (e.g., phylum) were corrected for multiple comparisons with the Benjamini–Hochberg false discovery rate (FDR) test[38] with a significance threshold of 0.05.