Skip to article frontmatterSkip to article content

Gut-to-soil axis 16S rRNA analysis tutorial đŸ’©đŸŒ±

Background¶

In this tutorial you’ll learn an end-to-end microbiome marker-gene data science workflow, building on data presented in Meilander et al. (2024): Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. The data used here is a subset (a single sequencing run) of that generated for the paper, specifically selected so that this tutorial can be run quickly on a personal computer. The full data set for the paper can be found in the study’s Artifact Repository. In the final step (not yet written, as of 17 April 2025), you’ll learn how to adapt the workflow for use in analyzing your own data using Provenance Replay.

The data used in this tutorial was generated using the Earth Microbiome Project protocol. Specifically, the hypervariable region 4 (V4) of the 16S rRNA gene was amplified using the F515-R806 primers - a broad-coverage primer pair for Bacteria that also amplifies some Archaea. Paired-end sequencing was performed on an Illumina MiSeq. Full details are presented in Meilander et al. (2024).

Sample metadata¶

Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. The following command will download the sample metadata as tab-separated text and save it in the file sample-metadata.tsv. This sample-metadata.tsv file is used throughout the rest of the tutorial.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'sample-metadata.tsv' \
  'https://gut-to-soil-tutorial.readthedocs.io/en/2025.4/data/gut-to-soil/sample-metadata.tsv'

QIIME 2’s metadata plugin provides a Visualizer called tabulate that generates a convenient view of a sample metadata file. Let’s run this, and then we’ll look at the result. Here’s the first QIIME 2 command that you should run in this tutorial:

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime metadata tabulate \
  --m-input-file sample-metadata.tsv \
  --o-visualization sample-metadata.qzv

This will generate a QIIME 2 Visualization. Visualizations can be viewed by loading them with QIIME 2 View. Navigate to QIIME 2 View, and drag and drop the visualization that was created to view it.

Access already-imported QIIME 2 data¶

This tutorial begins with paired-end read sequencing data that has already been demultiplexed and imported into a QIIME 2 Artifact. Because sequence data can be delivered to you in many different forms, it’s not possible to cover the varieties here. Instead we refer you to How to import data for use with QIIME 2 to learn how to import your data. If you want to learn why importing is necessary, refer to Why importing is necessary.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'demux.qza' \
  'https://gut-to-soil-tutorial.readthedocs.io/en/2025.4/data/gut-to-soil/demux.qza'

Summarize demultiplexed sequences¶

When you have demultiplexed sequence data, the next step is typically to generate a visual summary of it. This allows you to determine how many sequences were obtained per sample, and also to get a summary of the distribution of sequence qualities at each position in your sequence data.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime demux summarize \
  --i-data demux.qza \
  --o-visualization demux.qzv

Upstream data analysis¶

Generally, the term “upstream” is used to refer to data analysis pre-feature-asv_table, and “downstream” is used to refer to data analysis post-feature-asv_table. Let’s jump into our upstream analysis.

Sequence quality control and feature table construction¶

QIIME 2 plugins are available for several quality control methods, including DADA2, Deblur, and basic quality-score-based filtering. In this tutorial we present this step using DADA2. The result of this method will be a FeatureTable[Frequency] QIIME 2 artifact, which contains counts (frequencies) of each unique sequence in each sample in the dataset, and a FeatureData[Sequence] QIIME 2 artifact, which maps feature identifiers in the FeatureTable to the sequences they represent.

DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data. As implemented in the dada2 plugin, this quality control process will additionally filter any phiX reads (commonly present in marker gene Illumina sequence data) that are identified in the sequencing data, filter chimeric sequences, and merge paired end reads.

The denoise-paired action, which we’ll use here, requires four parameters that are used in quality filtering:

  • trim-left-f a, which trims off the first a bases of each forward read
  • trunc-len-f b which truncates each forward read at position b
  • trim-left-r c, which trims off the first c bases of each forward read
  • trunc-len-r d which truncates each forward read at position d This allows the user to remove low quality regions of the sequences. To determine what values to pass for these parameters, you should review the Interactive Quality Plot tab in the demux.qzv file that was generated above.
Solution to Exercise 2

The quality of the initial bases seems to be high, so I choose not to trim any bases from the beginning of the sequences. The quality also seems good all the way out to the end, though maybe dropping off after 250 bases. I’ll therefore truncate at 250. I’ll keep these values the same for both the forward and reverse reads, though that is not a requirement.

Now run your DADA2 command. This step may take up to 10 minutes to complete - it’s the longest running step in this tutorial.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs.qza \
  --o-table asv-table.qza \
  --o-denoising-stats stats.qza

One of the outputs created by DADA2 is a summary of the denoising run. That is generated as an Artifact, so can’t be viewed directly. However this is one of many QIIME 2 types that can be viewed as Metadata - a very powerful concept that we’ll use again later in this tutorial. Learning to view artifacts as Metadata creates nearly infinite possibilities for how you can explore your microbiome data with QIIME 2.

Here, we’ll again use the metadata plugins tabulate visualizer, but this time we’ll apply it to the DADA2 statistics.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime metadata tabulate \
  --m-input-file stats.qza \
  --o-visualization stats.qzv
Solution to Exercise 4
[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime metadata tabulate \
  --m-input-file sample-metadata.tsv stats.qza \
  --o-visualization sample-metadata-w-dada2-stats.qzv

Feature table and feature data summaries¶

After DADA2 completes, you’ll want to explore the resulting data. You can do this using the following two commands, which will create visual summaries of the data. The feature-table summarize action command will give you information on how many sequences are associated with each sample and with each feature, histograms of those distributions, and some related summary statistics.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table summarize-plus \
  --i-table asv-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-summary asv-table.qzv \
  --o-sample-frequencies sample-frequencies.qza \
  --o-feature-frequencies asv-frequencies.qza

The feature-table tabulate-seqs action command will provide a mapping of feature IDs to sequences, and provide links to easily BLAST each sequence against the NCBI nt database. We can also include the feature frequency information in this visualization by passing it as metadata, similar to how we merged metadata in the exercise above. In this case, however, we’re looking a feature metadata, as opposed to sample metadata. As far as QIIME 2 is concerned, there is no difference between these two - in our case, it’ll only be the identifiers that differ.

This visualization will be very useful later in the tutorial, when you want to learn more about specific features that are important in the data set.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table tabulate-seqs \
  --i-data asv-seqs.qza \
  --m-metadata-file asv-frequencies.qza \
  --o-visualization asv-seqs.qzv

Filtering features from a feature table¶

If you review the tabulated feature sequences, or the feature detail table of the feature table summary, you’ll notice that there are many sequences that are observed in only a single sample. Let’s filter those out to reduce the number of sequences we’re working with - this will speed up several slower steps that are coming up.

This is a two-step process. First we filter our feature table, and then we use the new feature table to filter our sequences to only the ones that are contained in the new table.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table filter-features \
  --i-table asv-table.qza \
  --p-min-samples 2 \
  --o-filtered-table asv-table-ms2.qza
[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table filter-seqs \
  --i-data asv-seqs.qza \
  --i-table asv-table-ms2.qza \
  --o-filtered-data asv-seqs-ms2.qza
Solution to Exercise 7

Here’s the command you would use:

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table summarize-plus \
  --i-table asv-table-ms2.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-summary asv-table-ms2.qzv \
  --o-sample-frequencies sample-frequencies-ms2.qza \
  --o-feature-frequencies asv-frequencies-ms2.qza

Be sure to run this as we’re going to use one of the results below.

Taxonomic annotation¶

Before we complete our upstream analysis steps, we’ll generate taxonomic annotations for our sequences using the feature-classifier plugin.

First, we’ll download a pre-trained classifier artifact.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'suboptimal-16S-rRNA-classifier.qza' \
  'https://gut-to-soil-tutorial.readthedocs.io/en/2025.4/data/gut-to-soil/suboptimal-16S-rRNA-classifier.qza'

Then, we’ll apply it to our sequences using classify-sklearn.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-classifier classify-sklearn \
  --i-classifier suboptimal-16S-rRNA-classifier.qza \
  --i-reads asv-seqs-ms2.qza \
  --o-classification taxonomy.qza

Then, to get an initial look at our taxonomic classifications, let’s integrate taxonomy in the sequence summary, like the one we generated above.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table tabulate-seqs \
  --i-data asv-seqs-ms2.qza \
  --i-taxonomy Greengenes_13_8:taxonomy.qza \
  --m-metadata-file asv-frequencies-ms2.qza \
  --o-visualization asv-seqs-ms2.qzv

Building a tree for phylogenetic diversity calculations¶

QIIME supports several phylogenetic diversity metrics, including Faith’s Phylogenetic Diversity and weighted and unweighted UniFrac. In addition to counts of features per sample (i.e., the data in the FeatureTable[Frequency] QIIME 2 artifact), these metrics require a rooted phylogenetic tree relating the features to one another. The amplicon distribution offers a few ways to build these trees, including a reference-based approach in the fragment-insertion plugin and de novo (i.e., reference-free) approaches in the phylogeny plugin.

The reference based approach, by default, is specific to 16S rRNA marker gene analysis. We could use that here, but the runtime is too long for our documentation.[1] If you’d like to see this demonstrated, you can refer to the Parkinson’s Mouse tutorial.

The de novo approach is known to generate low quality trees when very short sequences are used as input, but can be used with any phylogenetically informative marker gene (not just 16S). If you’d like to see this demonstrated, you can refer to the Moving Pictures tutorial.

For those reasons, we’re going to skip building phylogenetic trees and instead use an analog of phylogenetic diversity metrics here.

Downstream data analysis¶

As mentioned above, we tend to think of “downstream” analysis as beginning with a feature table, taxonomic annotation of our features, and optionally a phylogenetic tree. Now that we have those (with the exception of the tree, which we won’t use here), let’s jump in. This is where it starts to get fun! ⛷

Kmer-based diversity analysis¶

As mentioned above, we’re going to skip building phylogenetic trees and instead use an analog of phylogenetic diversity metrics here. This will use two stand-alone QIIME 2 plugins, q2-boots and q2-kmerizer, which are integrated through the kmer-diversity action in q2-boots. To learn about kmerization of features and how this relates to phylogenetic diversity metrics, read the q2-kmerizer paper. q2-boots provides actions that mirror the interface of the diversity metric calculation actions in the diversity plugin, but generates more robust results because it integrates rarefaction and/or bootstrapping. You can learn more about this in the q2-boots paper.

kmer-diversity is a qiime2.Pipeline: a type of action that links multiple other QIIME 2 actions together for convenience. As such, it does a lot of work. Here are the steps that it takes:

  1. Resample the input feature table (i.e., the ASV table) to contain exactly sampling-depth sequences per sample, either by bootstrapping or rarefaction, n times. Samples with fewer than sampling-depth sequences will be removed from the feature table and not included in the subsequent steps. This will result in n feature tables.
  2. For each feature table resulting from step 1, using the input sequences (i.e., the ASV sequences), kmerize all sequences into kmers of length kmer-size. [2] Use this information to create one kmer table per resampled feature table. This will result in n feature tables, where the features are kmers (instead of ASVs, as in the input feature table).
  3. Compute the user-requested alpha- and beta-diversity metrics on each of the kmer tables resulting from step 2. [3] The metrics computed by default are:
    • Alpha diversity
      • Shannon’s diversity index (a quantitative measure of community richness)
      • Observed Features (a qualitative measure of community richness)
      • Evenness (i.e., Pielou’s Evenness; a measure of community evenness)
    • Beta diversity
      • Jaccard distance (a qualitative measure of community dissimilarity)
      • Bray-Curtis distance (a quantitative measure of community dissimilarity)
  4. For each diversity metric, average the results computed across the n kmer tables. These results can be used in subsequent analysis steps (e.g., ordination, statistical modeling, machine learning).
  5. Perform PCoA ordination on the averaged beta diversity distance matrices resulting from Step 4. [4]
  6. Generate an interactive q2-vizard scatter plot that contains all user-provided sample metadata, all averaged alpha diversity metrics, and the first three ordination axes for all PCoA matrices computed in step 5.

A key parameter that needs to be provided to kmer-diversity is sampling-depth, which is the even sampling (i.e., bootstrapping or rarefaction) depth. Because most diversity metrics are sensitive to different sampling depths (i.e., sequence counts) across different samples, the tables are randomly subsampled such that the total frequency for each sample is the user-specified sampling depth. For example, if you set sampling-depth=500, this step will subsample the sequence counts in each sample so that each sample in the resulting table has a total frequency of 500. If the total frequency (i.e., the number of sequences observed) for any sample(s) is smaller than this value, those samples will be dropped from the diversity analysis. Choosing this value is tricky. We recommend making your choice by reviewing the information presented in the asv-table-ms2.qzv file that you created above.

I’m going to choose values that is around the first quartile of the sample total frequencies.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime boots kmer-diversity \
  --i-table asv-table-ms2.qza \
  --i-sequences asv-seqs-ms2.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-sampling-depth 96 \
  --p-n 10 \
  --p-replacement \
  --p-alpha-average-method median \
  --p-beta-average-method medoid \
  --output-dir boots-kmer-diversity
  • boots-kmer-diversity/resampled_tables | download
  • boots-kmer-diversity/kmer_tables | download
  • boots-kmer-diversity/alpha_diversities | download
  • boots-kmer-diversity/distance_matrices | download
  • boots-kmer-diversity/pcoas | download
  • boots-kmer-diversity/scatter_plot.qzv | download | view

After computing diversity metrics, we can begin to explore the microbial composition of the samples in the context of the sample metadata. You can review the sample metadata using one of the tabulated views of this file that we created above.

Alpha rarefaction plotting¶

In this section we’ll explore alpha diversity as a function of sampling depth using thealpha-rarefaction action. This visualizer computes one or more alpha diversity metrics at multiple sampling depths, in steps between 1 (optionally controlled with min-depth) and the value provided as max-depth. At each sampling depth step, 10 rarefied tables will be generated, and the diversity metrics will be computed for all samples in the tables. The number of iterations (rarefied tables computed at each sampling depth) can be controlled with the iterations parameter. Average diversity values will be plotted for each sample at each even sampling depth, and samples can be grouped based on metadata in the resulting visualization if sample metadata is provided.

The value that you provide for max-depth should be determined by reviewing the “Frequency per sample” information presented in the asv-table-ms2.qzv file. In general, choosing a value that is somewhere around the median frequency seems to work well, but you may want to increase that value if the lines in the resulting rarefaction plot don’t appear to be leveling out, or decrease that value if you seem to be losing many of your samples due to low total frequencies closer to the minimum sampling depth than the maximum sampling depth.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime diversity alpha-rarefaction \
  --i-table asv-table-ms2.qza \
  --p-max-depth 260 \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization alpha-rarefaction.qzv

The visualization will have two plots. The top plot is an alpha rarefaction plot, and is primarily used to determine if the richness of the samples has been fully observed or sequenced. If the lines in the plot appear to “level out” (i.e., approach a slope of zero) at some sampling depth along the x-axis, that suggests that collecting additional sequences beyond that sampling depth would not be likely to result in the observation of additional features. If the lines in the plot don’t level out, this may be because the richness of the samples hasn’t been fully observed yet (because too few sequences were collected), or it could be an indicator that a lot of sequencing error remains in the data (which is being mistaken for novel diversity).

The bottom plot in this visualization is important when grouping samples by metadata. It illustrates the number of samples that remain in each group when the feature table is rarefied to each sampling depth. If a given sampling depth d is larger than the total frequency of a sample s (i.e., the number of sequences that were obtained for sample s), it is not possible to compute the diversity metric for sample s at sampling depth d. If many of the samples in a group have lower total frequencies than d, the average diversity presented for that group at d in the top plot will be unreliable because it will have been computed on relatively few samples. When grouping samples by metadata, it is therefore essential to look at the bottom plot to ensure that the data presented in the top plot is reliable.

Taxonomic analysis¶

Next, we can view the taxonomic composition of our samples with interactive bar plots. Generate those plots with the following command and then open the visualization.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime taxa barplot \
  --i-table asv-table-ms2.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots.qzv
Solution to Exercise 13

We use the ASV table here because the feature ids in that table are the same as the ones used in the FeatureData[Taxonomy] artifact. Additionally, kmerization of our data is a tool used for computing diversity metrics - not something we generally intend to use throughout our analyses.

Differential abundance testing with ANCOM-BC2¶

ANCOM-BC2 is a compositionally-aware linear regression model that allows testing for differentially abundant features across sample groups while also implementing bias correction. This can be accessed using the ancombc2 action in the composition plugin.

We’ll perform this analysis in a few steps.

Filter samples from the feature table¶

First, we’ll filter our samples from our feature table such that we only have the three groups that we have the most samples for.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime feature-table filter-samples \
  --i-table asv-table-ms2.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-where '[SampleType] IN ("Human Excrement Compost", "Human Excrement", "Food Compost")' \
  --o-filtered-table asv-table-ms2-dominant-sample-types.qza

Apply differential abundance testing¶

Then, we’ll apply ANCOM-BC2 to see which ASV are differentially abundant across those sample types. I specify a reference level here as this defines what each group is compared against. Since the focus of this study is HEC, I choose that as my reference level. That will let us see what ASVs are over- or under-represented in the other two sample groups (Human Excrement and Food Compost) relative to HEC, as HEC defines the “global intercept” that will be measured against.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime composition ancombc2 \
  --i-table asv-table-ms2-dominant-sample-types.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-fixed-effects-formula SampleType \
  --p-reference-levels 'SampleType::Human Excrement Compost' \
  --o-ancombc2-output ancombc2-results.qza

Finally, we’ll visualize the results. Taxonomic annotations can optionally be provided here: this helps make ASVs ids more interpretable.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime composition ancombc2-visualizer \
  --i-data ancombc2-results.qza \
  --i-taxonomy taxonomy.qza \
  --o-visualization ancombc2-barplot.qzv
Solution to Exercise 16

To collapse our ASVs into genera (i.e. level 6 of the Greengenes taxonomy), we can use the following command.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime taxa collapse \
  --i-table asv-table-ms2-dominant-sample-types.qza \
  --i-taxonomy taxonomy.qza \
  --p-level 6 \
  --o-collapsed-table genus-table-ms2-dominant-sample-types.qza

We can then provide the resulting table as the input to ANCOM-BC2.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime composition ancombc2 \
  --i-table genus-table-ms2-dominant-sample-types.qza \
  --m-metadata-file sample-metadata.tsv \
  --p-fixed-effects-formula SampleType \
  --p-reference-levels 'SampleType::Human Excrement Compost' \
  --o-ancombc2-output genus-ancombc2-results.qza

And finally, we can visualize the results. Notice that in this case we’re not providing the taxonomy, because we’ve already intergrated that information by collapsing at the genus level.

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
qiime composition ancombc2-visualizer \
  --i-data genus-ancombc2-results.qza \
  --o-visualization genus-ancombc2-barplot.qzv

That’s it for now, but more is coming soon!¶

In the near future (as of 17 April 2025) we plan to integrate analyses using the sample-classifier plugin and longitudinal plugin. In the meantime, here are some suggestions to continue your learning:

  1. Build a machine learning classifier that classifies samples accordining to the three dominant sample types in the feature table that we used with ANCOM-BC. (Hint: see classify-samples.)
  2. Perform a longitudinal analysis that tracks samples from different buckets over time. Which taxa change most over time? (Hint: see feature-volatility.)
  3. Remember that the full data set (five sequencing runs) are available in the gut-to-soil Artifact Repository. Grab one of the larger sequencing runs (we worked with a small sequencing run that was generated as a preliminary test), and adapt the commands in this tutorial to work on a bigger data set.

We’re also in the process of refactoring our statistical methods for assessing alpha and beta diversity across groups, using the new stats plugin. We’re therefore holding off on integrating statistical analysis until we have that ready. In the meantime, you can refer to you can refer to the Moving Pictures tutorial, as well as the sample-classifier and longitudinal tutorials.

Replay provenance (work in progress!)¶

You might next want to try to adapt the commands presented in this tutorial to your own data, adjusting parameter settings and metadata column headers as is relevant. QIIME 2’s provenance replay functionality can help with this. Assuming that you ran all of the steps above in a directory called gut-to-soil/, run the following command to generate a template script that you can adapt for your workflow:

qiime tools replay-provenance \
  --in-fp gut-to-soil/taxa-bar-plots.qzv \
  --out-fp g2s-replayed.bash

If you need help, head over to the QIIME 2 Forum.

Footnotes¶
  1. The resource requirements exceed those provided by the Read the Docs (RTD) build system, which is used to build the documentation that you’re reading. RTD provides systems with 7GB of RAM for 30 minutes maximum to build documentation. That’s a very reasonable (and generous) allocation for building documentation, so we choose to work within those contraints rather than creating our own documentation build system like we’ve had in the past (e.g., for https://docs.qiime2.org).

  2. kmerization of biological sequences is described in the Database Searching chapter of An Introduction to Applied Bioinformatics.

  3. Learn more about the available metrics in this QIIME 2 Forum post.

References¶
  1. Raspet, I., Gehret, E., Herman, C., Meilander, J., Manley, A., Simard, A., Bolyen, E., & Caporaso, J. G. (2025). Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots. F1000Research, 14, 87. 10.12688/f1000research.156295.1
  2. Bokulich, N. A. (2025). Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting. mSystems, 10(3). 10.1128/msystems.01550-24
  3. Meilander, J., Herman, C., Manley, A., Augustine, G., Birdsell, D., Bolyen, E., Celona, K. R., Coffey, H., Cocking, J., Donoghue, T., Draves, A., Erickson, D., Foley, M., Gehret, L., Hagen, J., Hepp, C., Ingram, P., John, D., Kadar, K., 
 Caporaso, J. G. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. arXiv. 10.48550/ARXIV.2411.04148
  4. Caporaso, J. G., & Meilander, J. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis (supporting data). Zenodo. 10.5281/ZENODO.13887456
  5. Keefe, C. R., Dillon, M. R., Gehret, E., Herman, C., Jewell, M., Wood, C. V., Bolyen, E., & Caporaso, J. G. (2023). Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay. PLOS Computational Biology, 19(11), e1011676. 10.1371/journal.pcbi.1011676
  6. Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Huntley, J., Fierer, N., Owens, S. M., Betley, J., Fraser, L., Bauer, M., Gormley, N., Gilbert, J. A., Smith, G., & Knight, R. (2012). Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. The ISME Journal, 6(8), 1621–1624. 10.1038/ismej.2012.8
  7. Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581–583. 10.1038/nmeth.3869
  8. Amir, A., McDonald, D., Navas-Molina, J. A., Kopylova, E., Morton, J. T., Zech Xu, Z., Kightley, E. P., Thompson, L. R., Hyde, E. R., Gonzalez, A., & Knight, R. (2017). Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns. mSystems, 2(2). 10.1128/msystems.00191-16
  9. Bokulich, N. A., Subramanian, S., Faith, J. J., Gevers, D., Gordon, J. I., Knight, R., Mills, D. A., & Caporaso, J. G. (2013). Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nature Methods, 10(1), 57–59. 10.1038/nmeth.2276
  10. Kaehler, B. D., Bokulich, N. A., McDonald, D., Knight, R., Caporaso, J. G., & Huttley, G. A. (2019). Species abundance information improves sequence taxonomy classification accuracy. Nature Communications, 10(1). 10.1038/s41467-019-12669-6
  11. Lin, H., & Peddada, S. D. (2023). Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures. Nature Methods, 21(1), 83–91. 10.1038/s41592-023-02092-7