Discussion:
Host genetic variation impacts microbiome composition across human body sites
(too old to reply)
XXIIIx
2017-03-29 21:22:05 UTC
Permalink
Host genetic variation impacts microbiome composition across human body sites

Ran BlekhmanEmail author, Julia K. Goodrich, Katherine Huang, Qi Sun, Robert Bukowski, Jordana T. Bell, Timothy D. Spector, Alon Keinan, Ruth E. Ley, Dirk Gevers and Andrew G. Clark
Genome Biology201516:191
DOI: 10.1186/s13059-015-0759-1© Blekhman et al. 2015
Received: 16 June 2015Accepted: 24 August 2015Published: 15 September 2015
Abstract

Background
The composition of bacteria in and on the human body varies widely across human individuals, and has been associated with multiple health conditions. While microbial communities are influenced by environmental factors, some degree of genetic influence of the host on the microbiome is also expected. This study is part of an expanding effort to comprehensively profile the interactions between human genetic variation and the composition of this microbial ecosystem on a genome- and microbiome-wide scale.

Results
Here, we jointly analyze the composition of the human microbiome and host genetic variation. By mining the shotgun metagenomic data from the Human Microbiome Project for host DNA reads, we gathered information on host genetic variation for 93 individuals for whom bacterial abundance data are also available. Using this dataset, we identify significant associations between host genetic variation and microbiome composition in 10 of the 15 body sites tested. These associations are driven by host genetic variation in immunity-related pathways, and are especially enriched in host genes that have been previously associated with microbiome-related complex diseases, such as inflammatory bowel disease and obesity-related disorders. Lastly, we show that host genomic regions associated with the microbiome have high levels of genetic differentiation among human populations, possibly indicating host genomic adaptation to environment-specific microbiomes.

Conclusions
Our results highlight the role of host genetic variation in shaping the composition of the human microbiome, and provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.

Background

Recent advances in high-throughput sequencing technologies have unveiled wide variability in the microbial communities that coat the human body [1, 2]. There are differences in the microbiota across body sites, which constitute distinct ecological niches [1, 3, 4]. Within each body site, the composition of the microbiome may change rapidly, but community features can remain constant for years [5, 6]. There is great variability in the microbiome across individuals, with some differences associated with chronic conditions, including obesity, diabetes, and inflammatory bowel disease (IBD) [7, 8, 9, 10, 11, 12]. Recent studies in germ-free animals have shown that these shifts in the microbiome can have an effect on host traits and could be causal in disease phenotypes [7, 12, 13, 14]. Therefore, understanding the factors that impact the composition of the microbiome in healthy individuals is critical to elucidate the role of the microbiome in disease and for development of therapeutics targeting the microbiome.

The composition of the human microbiome is influenced by multiple environmental factors. For example, changes in host diet affect gut microbiome communities at the taxonomic and functional level [5, 15]. In addition, intake of drugs and antibiotics can modulate the gut microbiome [16, 17]. Moreover, studies have shown variation in the gut microbiome can be controlled by interactions with pathogens and parasites [18, 19]. Lastly, social contact and interaction with the environment have also been implicated in shaping the microbial flora in the gut and skin [20, 21, 22].

Along with this clear evidence for the influence of environmental factors, there is also support for a host genetic component in structuring of human microbial communities [23]. For example, single nucleotide polymorphisms (SNPs) in the MEFV gene are associated with changes in human gut bacterial community structure [24], and IBD-risk loci are associated with changes in gut microbiome composition [25]. Researchers have also shown that a loss-of-function polymorphism in the gene FUT2, which is a known risk factor for Crohn’s disease, may modulate energy metabolism of the gut microbiome [26]. Investigating individuals with inflammatory bowel disease, Knights et al. have shown that NOD2 risk allele count is correlated with an increase in the relative abundance of Enterobacteriaceae [27].

In addition to targeted and candidate gene approaches, researchers have also used host genome-wide genetic variation to find interactions with the microbiome. For example, in a recent study using 416 twin pairs to assess the heritability of the microbiome, Goodrich et al. identified microbial taxa for which relative abundance is more similar in monozygotic compared to dizygotic twins [14]. In the laboratory mouse, quantitative trait locus (QTL)-mapping approaches have found multiple loci associated with gut microbial community composition, some of which overlap genes involved in immune response [28, 29]. Moreover, researchers have shown that host mitochondrial DNA haplogroups are correlated with the structure of microbiome communities [30]. However, to date, there are no genome-wide studies that attempt to characterize specific genes and pathways in the human genome that shape the composition of the microbiome, although the value of such studies has often been suggested [31, 32].

Here, we performed a genome-wide analysis to identify human genes and pathways correlated with microbiome composition, using data generated by the Human Microbiome Project (HMP). In the last few years, the HMP has sampled and cataloged the microbial diversity across multiple body sites in a few hundred individuals [33]. Since genotype data are not yet available for the individuals included in the HMP study, we extracted host genomic information from the ‘human contamination’ reads in the HMP shotgun metagenomic sequencing. This allowed us to generate genome-wide genetic variation data from 93 individuals, which we then tested for association with the microbiome profiles generated by the HMP.

Results and discussion

Mining the human microbiome project data for host reads
First, we scanned and identified the short reads in the metagenomic sequencing data that map to the human genome. By combining these reads across body sites (primarily originating from nares and cheek swabs [33]) for each individual (Additional file 1: Figure S1), we attained a mean depth of coverage of more than 10 reads per base pair per individual (Additional file 1: Figure S2). Combining all 93 individuals, the mean depth of coverage for each site is 1,061 reads (median 1,093), and 99 % of sites are covered at >500x summed across individuals. There is noticeable variability across individuals, although most individuals have a mean coverage in the range of 5x-20x (Additional file 1: Figure S3). We performed genotype calling on these individuals using stringent quality controls and filtering, and identified a final set of 4.2 million high-quality and informative single nucleotide polymorphisms (SNPs), of which 92 % were previously known and found in dbSNP, and were used in subsequent analyses (Additional file 1: Figures S1 to S10). The number of SNPs we identified is in line with previous reports using whole-genome sequencing in humans [34].

Correlation between host genetic variation and microbiome composition
First, we examined the correlation between host genetic variation and the overall diversity of the microbiome. At this point we attempted to identify gross correlation signatures, still without accounting for population structure, and deferring the discussion of mechanistic causes for these correlations until later in the paper. We calculated the coordinates underlying variability in the host genetic data using multidimensional scaling (MDS). We then calculated alpha diversity, a measure of within-sample microbial diversity within each body site (that is, richness within a sample), and found it to be correlated with the first coordinate of host genetic variation data in the anterior nares (Fig. 1a, R 2 = 0.207, P = 0.039) and the right retroauricular crease (Additional file 1: Figure S11, R2 = 0.218, P = 0.01). In addition, we found correlations in several additional coordinates; for example, the third principal component (PC) of host genetic variation is correlated with alpha diversity in the supragingival plaque, the throat, and the tongue dorsum (Additional file 1: Figure S11). Reduced alpha diversity has been previously linked to different health conditions (for example, inflammatory bowel disease [7], type 2 diabetes [11], and obesity [35]), and our results suggest a possible role for host genetics in controlling the alpha diversity. Next, we looked for correlations of host genetics with the overall composition of the microbiome. We found correlations between the first host genetic principal coordinate and microbiome PCs in the stool and palatine tonsils (Fig. 1b and Additional file 1: Figure S12). We also found correlations at a number of other body sites, although most were not statistically significant after multiple test correction (Additional file 1: Figures S12-S17). Nevertheless, taken together, these correlations suggest a potential relationship between host genetics and microbiome composition.

Fig. 1
Host genetic variation is correlated with microbiome composition. a Correlation of the first PC of host genetic data (x-axis) and alpha diversity of the anterior nares microbiome (y-axis). b Correlation of the first PC of host genetic data (x-axis) and first PC of the stool microbiome data (y-axis). c Identity-by-state between individual pairs calculated from host genome data (x-axis) is correlated with stool microbiome beta diversity (y-axis), which tabulates the magnitude of pairwise differentiation between the microbiomes of same pair of individuals. In all panels, solid and dashed gray lines represent a linear regression and loess regression fit to the data, respectively
This dataset also allows us to compare between-individual differences in the microbiome and host genetic variation. We correlated microbial beta diversity (that is, between-sample diversity) at each body site with genome-wide identity-by-state, a statistic estimating similarity in genome sequence between pairs of individuals. We found that identity-by-state is significantly negatively correlated with beta diversity in 10 of the 15 body sites (Additional file 1: Figure S13), including in the stool (Fig. 1c, R 2 = 0.19, P <1015), anterior nares, hard palate, palatine tonsils, saliva, supragingival plaque, throat, and tongue dorsum (P <0.01 in each of the 10 body sites). These results indicate that the similarity in genome sequence is positively correlated with microbiome similarity, supporting a relationship between host genetic variation and the microbiome at a large scale. However, this pattern may be partly driven by population stratification, or non-genetic environmental factors that are correlated with genetic ancestry. For example, previous studies have found differences in the gut microbiome between human populations [36, 37], so geographic stratification could drive a biologically non-causal correlation between genetic ancestry and local diet, and thus with gut microbial composition.

Host genes and pathways correlated with microbiome composition
In an effort to control for population structure, in addition to other non-genetic factors that may be driving spurious correlations, we analyzed the data using a linear mixed model. The additive effects model included as covariates possible confounders, such as gender, sample collection location, sequencing center, and the first five coordinates from the MDS analysis of the host genotypic data. By including these covariates we are attempting to correct for effects of individual ancestry and extrinsic factors on the microbiome. We note that there are additional potential confounding factors that we could not account for in our model; for example, physical interaction between individuals, which has been shown to affect microbiome composition in primates [20], is not included, as these data were not collected by the HMP. We ran this model genome-wide, correlating host genetic variation in each SNP with the first five PCs of the microbiome in each of the 15 body sites. In addition to controlling for confounders, this genome-wide approach also allows us to identify specific loci in the host genome that are correlated with the microbiome, and understand their likely functional effect in the host. We recognize at the outset that our sample size is an order of magnitude smaller than most genome-wide association studies (GWAS), precluding us from being able to perform a standard test of association between microbiome composition and each SNP. Therefore, instead, we used a pathway-based analysis, whereby we aggregated SNPs into pathways in order to learn about the biological functions and processes that underlie interactions between host genome and the microbiome. We note that this is a common analysis approach for genome-wide association data, driven by the rationale that complex traits are controlled by multiple genetic effects, which could originate in different genes, but are likely to aggregate in the same biological pathway or function. The approach is aiming to identify these functions by looking for enrichments of biological functional categories among a set of associated genetic loci. Specifically, we first aggregated SNPs that were correlated with at least one microbiome PC at an arbitrary nominal cutoff of P ≤10−6 (using several other P value thresholds did not change the results; see Additional file 2: Tables S1 and S2). We then identified overlapping or nearby genes, and used these gene sets to perform a functional enrichment analysis.

Using this approach, we found the most significant enrichment with genes involved in pathway Leptin Signaling in Obesity (P = 2.29 × 10−7, Additional file 2: Table S1). Leptin is a hormone whose structure places it in the cytokine superfamily. It has been linked to the microbiome in several recent studies, mainly using leptin-deficient ob/ob mice [13, 38]. Leptin has several important roles in immunity, including activation of monocytes, neutrophils, and macrophages, and modulation of inflammation [39]. Leptin may also impact the microbiome indirectly in its role as a hormone, whereby it regulates appetite and body weight, affects basal metabolism, and regulates insulin secretion, among other functions [39]. The enrichment identified here is driven by significant correlations of host genetic variation with microbiome PCs in the nose, oral cavity, and skin (see Additional file 2: Table S1). Studies have shown that the leptin is expressed and has a functional role in the mouth [40]. Leptin and leptin receptor are expressed in the skin [41], and may have a functional role in wound healing and psoriasis [42, 43]. Moreover, leptin is expressed in nasal polyps, and may affect the expression of mucin genes in polyp epithelial cells [44]. Nevertheless, the role of leptin in interactions with microbial flora in these body sites is still not well understood.

In addition to leptin signaling, several other immunity-related pathways are enriched among microbiome-correlated host genes, such as Melatonin Signaling, JAK/Stat Signaling, Chemokine Signaling, CXCR4 Signaling, and Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses (Additional file 2: Tables S1 and S2). To further investigate the role of host genetic variation in immunity-related genes on the microbiome, we used the InnateDB database, and identified additional enriched pathways, including Interleukin-12-Mediated Signaling Pathway, GABAA Receptor Activation, Inositol Phosphate Metabolism, IL2, CXCR4-Mediated Signaling Events, and GnRH Signaling Pathway (Additional file 2: Tables S3 and S4). In addition, we found enrichment of genes in the REACTOME pathway Sulfide Oxidation to Sulfate, suggesting a potential role for host genetic variation in genes determining sulfate abundance in controlling microbial composition. We also found enrichment in the KEGG pathway Primary Bile Acid Biosynthesis. Recent studies have shown that the microbiome can modulate bile acid metabolism [45], and our results support a possible role for host genetic variation in bile acid metabolic pathways in interacting with the microbiota.

Next, we examined correlations between microbiome composition and host genetic loci that had been found to be associated with complex disease. For that purpose, we used the GWAS catalog [46], and looked for enrichment of genes found to be associated with specific complex disease. For each disease in the catalog, we plotted the overlap between the genes associated with the disease and the genes found in our study to be associated to microbiome composition. Plotting this overlap over a range of P value cutoffs for each GWAS dataset, we detected enrichments in a number of diseases (Fig. 2a). We found enrichments in genes associated with several complex diseases for which a role for the microbiome has been shown, such as ulcerative colitis [47], inflammatory bowel disease [48], obesity-related traits [7], and HDL cholesterol and triglycerides. In addition, we found enrichment of genes associated with metabolite levels and metabolic traits, for which an interaction with the microbiome has been observed [35].

Fig. 2
Complex disease and functional SNPs are enriched among microbiome-correlated host genetic variation. a Enrichment of genes correlated with microbiome composition (y-axis) compared to all other genes that are significantly associated with a complex disease using a given P value threshold (x-axis). Each colored line represents a different complex disease with an enrichment of at least three-fold. b Enrichment of SNPs correlated with microbiome composition (y-axis) compared to all other SNPs that have been identified as eQTLs in the GTEx data using a given P value threshold (x-axis). Each colored line represents a different tissue type analyzed by GTEx. c Enrichment of SNPs (blue) and genes (red) correlated with microbiome composition in this study (y-axis) among SNPs and genes correlated with microbiome composition in the TwinsUK dataset using a given P value threshold (x-axis)
We used a similar approach to identify enrichment of SNPs annotated as expression quantitative trait loci (eQTLs) among the sites we found to be correlated with microbiome composition (Fig. 2b). We found an enrichment of eQTLs in several tissues that were identified in the GTEx project [49]. This result indicates that the loci we identified in our analysis as correlated with microbiome composition are likely to have a functional role in regulating gene expression. Lastly, we sought to validate our results using an independent cohort. We followed a similar approach to identify correlations between GI tract microbiome PCs and host genetic variation in 984 individuals from the TwinsUK project cohort [14, 50]. We find an enrichment of SNPs correlated with microbiome composition in both studies (Fig. 2c; P = 0.028 using Fisher’s exact test for significant overlap between the two sets of SNPs). When considering genes located nearby correlated SNPs, the enrichment becomes more prominent; possibly indicating that different SNPs may control similar microbiome-linked genes and pathways.

Host genetic variation correlated with bacterial taxa
In addition to identifying interactions with the overall structure of the microbiome, we were interested in finding correlations between host genetic variation and specific bacterial taxa. To do so, we tested for correlation between genetic variation and relative abundances of bacteria derived from the HMP 16S rRNA gene sequences. Abundance data from HMP OTUs were parsed, extensively filtered, normalized, and taxonomically collapsed, to achieve a single representation for each taxon at the genus level or above (see Additional file 1: Figures S14-S19 and Additional file 3). After filtering inter-correlated taxa, our final dataset included 615 microbiome abundance traits in 15 body sites. In an effort to reduce the number of statistical tests, we included in the analysis only host SNPs located within protein-coding sequences.

Using this approach, we found 83 associations between genetic variation in host coding sequence and abundance of specific microbial taxa (genome-wide false discovery rate Q-value <0.1). These 83 associations are described in Additional file 2: Table S5. Among these, we find several key host genes related to immunity, such as HLA-DRA (P = 3.72 × 10−6) and TLR1 (P = 5.04 × 10−6), which we found to be correlated with abundance of Selenomonas in the throat and Lautropia in the tongue dorsum, respectively. Another interesting correlation was found between host genetic variation in SNPs in the LCT gene and the abundance of Bifidobacterium in the GI tract (P = 1.16 × 10−5, Fig. 3a, b). LCT encodes the lactase enzyme, which is expressed in the GI tract and acts to hydrolyze lactose, the sugar found in dairy products. Intriguingly, Bifidobacterium can metabolize lactose, and reports show that some strains prefer lactose to glucose [51]. Since genetic variants in and around LCT are directly linked to lactase persistence [52], it is likely that the variants we observed dictate an individual’s consumption of milk products, which in turn may regulate the abundance of Bifidobacterium in the GI tract. Although the data do not provide sufficient resolution to discriminate the Bifidobacterium species that drives this association, further analytical and experimental approaches may shed light on this result.

Fig. 3
Correlation between coding genetic variation and bacterial abundance. a Manhattan plot illustrating the P values (y-axis, −log scale) for correlation of each tested coding SNP (shown as circles) by its genomic location (x-axis) with the abundance of Bifidobacterium in the gut. SNP colors alternate by chromosome, with red dots representing SNPs with P values that surpass genome-wide significance after FDR correction. b A close-up of the region of correlation within LCT. Genomic positions on chromosome 2 are on the x-axis, and the P values are on the y-axis (−log scale). Each dot represents a SNP tested using our model, and the color represents the linkage disequilibrium (r 2 ) between each dot and the top SNP, colored purple and indicated by its dbSNP rsID (inset legend indicates the spectrum of colors and matching r 2 values). Blue lines represent recombination rate calculated from the European samples in the 1000 Genomes Project. Gene regions are shown underneath, with LCT highlighted. c An interaction network generated using IPA showing pathways that are enriched among genes that harbor SNPs correlated with abundance of bacterial taxa (in orange). Lines represent known interactions between genes, and shapes represent types of proteins (see legend at the bottom left)
Using pathway enrichment approaches described above, we found that genes linked to abundance of bacterial taxa are over-represented with relevant diseases (Additional file 2: Table S6), including transendothelial migration of lymphocytes, meningitis, and several cancer categories, including gastrointestinal adenocarcinoma, growth of mammary tumor, head and neck tumor, and thyroid cancer. To further visualize the interactions between these genes, we used the Ingenuity Pathway Analysis knowledgebase, which holds curated information on molecular pathways and protein interactions, and identified several networks significantly enriched with genes correlated with bacterial taxa abundances (Additional file 2: Table S7). Figure 3c displays the highest-scoring network, containing genes involved with cellular movement, hematological system development and function, and immune cell trafficking.

Lastly, we investigated the evolutionary pressures acting on the SNPs we found to be correlated with microbiome composition. To do so, we used F ST, a measure of allele frequency differentiation between human populations, calculated from the 1000 Genomes Project data (see Materials and Methods) [34]. Comparing F ST between four human populations (African, American, Asian, and European), we found that SNPs that were linked to microbial communities in our study have higher F ST values compared to the rest of the genome (Fig. 4; FDR Q <0.05 for the highlighted comparisons using a permutation test on the medians; see Additional file 3). Interestingly, we found that in some body sites, the microbiome is linked to genes with higher F ST values across most population comparisons; for example, the oral cavity microbiome is linked to higher F ST in all pairwise comparisons among populations, except Asian vs. European. In addition, specific population pairs seem to be enriched with higher F ST across body sites; for example, both the African vs. Asian and the American vs. Asian comparisons show high F ST values in the genes that interact with microbial communities in three of the four body sites (oral cavity, GI tract, and airways). Overall, 12 of the 24 comparisons yielded significantly high F ST compared to the genome-wide average, while six comparisons yielded significantly lower values.

Fig. 4
SNPs correlated with microbiome composition have high FST values between human populations. Each panel represents a comparison of a pair of human populations indicated in the title. Shown is the F ST median + 95 % CI (x-axis, calculated using bootstrapping) in SNPs where genetic variation is correlated with microbial taxa at P <10−4, separated by the body site (y-axis). Vertical dashed line represents the genome-wide median FST. Color highlight was used in cases where F ST in microbiome-correlated sites was significantly higher than the genome-wide value (FDR Q <0.05; using a permutation test of the median)
These results suggest that host genetic variation that is linked to microbial variation is enriched with sites that evolve under differential selection pressures across human populations. This is consistent with the notion of local adaptations to population-specific microbiomes, possibly controlled by environmental conditions for each population. Given that genes that we found to be linked to microbiome composition are enriched with immunity-related genes and pathways, this result may not be surprising; indeed, genetic variation in immune genes has long been associated with higher rated of positive selection in human populations [53]. However, these selective pressures were hypothesized to be mainly a result of interaction with pathogens. Our results indicate that selection pressures on immunity genes and pathways may also be due to interaction with commensal microbial communities that accompany changing environments. Another potential explanation for this pattern is that past selection pressures against pathogens have driven changes in immunity genes that affect the commensal microbiome as a byproduct. Although distinguishing between these hypotheses is not possible using currently available data, the end result – commensal microbial traits affected by past selection events on host genes – is an exciting finding that we hope would be explored further in the future.

Conclusions

We describe an analysis of host genetic variation data mined from the metagenomic shotgun sequencing performed by the Human Microbiome Project. The ability to mine host genetic material from metagenomic shotgun sequence data has recently raised several privacy concerns [54]. We note that in the current study, informed consent for sequencing of host DNA was given by the participants, although this is not a common procedure for metagenomics studies. We show here that it is possible to reconstruct complete host genomes using metagenomic sequence data, which is potentially identifiable. However, this was possible due to the unique study design of the HMP, whereby multiple body sites from each individual were sequenced at a high depth, allowing us to pool data across body sites and reach a 10x mean coverage per host genome. Common metagenomic shotgun sequencing studies, which usually include an order of magnitude less sequence data, are unlikely to enable such an analysis. Moreover, the majority of studies sequence stool samples, which include many fewer host-derived reads. Nevertheless, we anticipate that future shotgun metagenomics sequencing studies would consider these potential privacy concerns.

The analysis described in this paper focused on the taxonomic structure of the microbiome. However, it would be interesting to incorporate the functional composition of the microbiome when considering associations with host genetic variation. Indeed, several studies have highlighted the importance of shotgun metagenomics for uncovering the genic composition and metabolic capacity of the microbiome [1, 48]. A similar analysis would be critical to uncover functional interactions that could not be detected by looking at community and taxonomic composition. In addition, there are several environmental factors that could influence the microbiome, such as diet, which were not included in our analysis. We expect that the inclusion of such potential confounders in future studies would help to further disentangle the effects of environment and host genetic variation on the microbiome.

Our analysis has shown that host genetic variation in immunity-related pathways is correlated with microbiome composition. These results are consistent with recent reports of host immunity involvement in modulating microbiome structure, for example through production of antimicrobial compounds [55] or inflammation [56]. Additionally, many recent studies have shown that a mice with a knocked-out immune gene display dramatic changes in their microbiota [57, 58, 59, 60]. Moreover, genetic variation in immune genes in the mouse was found to be correlated with the composition of the microbiome [61]. In addition, our results show that the host variants and genes that are correlated with the structure of the microbiome are enriched in genes associated with complex disease that have been linked to the microbiome. This result is not surprising, considering that recent studies in the mouse have shown that microbiome QTLs overlap complex disease-linked genes [28, 29]. Taken together, these findings motivate the need for larger association studies to characterize host genetic variation linked to the microbiome in the context of various health conditions, environmental effects, and genetic backgrounds. Moreover, functional studies, for example using cells or animal models, would be crucial for elucidating the causal mechanisms whereby human genetic variation impacts the microbiome.

Materials and methods

Read More:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0759-1
Taka
2017-04-10 06:20:34 UTC
Permalink
It's dangerous to enforce preset microbiome on people, e.g. via simple feeding of fermented foods like joghurts or natto.... Best way is to build yo own customized microbiome through eating the right foods (prebiotics).
Loading...