Metagenome analysis using the Kraken software suite

Lu, Jennifer; Rincon, Natalia; Wood, Derrick E.; Breitwieser, Florian P.; Pockrandt, Christopher; Langmead, Ben; Salzberg, Steven L.; Steinegger, Martin

doi:10.1038/s41596-022-00738-y

Protocol
Published: 28 September 2022

Metagenome analysis using the Kraken software suite

Nature Protocols volume 17, pages 2815–2839 (2022)Cite this article

52k Accesses
63 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 29 August 2024

This article has been updated

Abstract

Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1–2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 3: Pavian output for hierarchical visualization.**

**Fig. 4: Pavian output for pathogen identification.**

**Fig. 7: Pathogen identification results.**

Benchmarking second and third-generation sequencing platforms for microbial metagenomics

Article Open access 11 November 2022

Critical Assessment of Metagenome Interpretation: the second round of challenges

Article Open access 08 April 2022

Unveiling microbial diversity: harnessing long-read sequencing technology

Article 30 April 2024

Data availability

The microbiome analysis used three samples from Taur et al.⁸, and the pathogen identification used ten samples from Li et al.⁹, all of which can be found on NCBI with their SRA IDs. Source data are provided with this paper.

Code availability

The following website details and links all software and databases used in this protocol: http://ccb.jhu.edu/data/kraken2_protocol/. We also provide easy-to-use Jupyter notebooks for both workflows, which can be executed in the browser using Google Collab: https://github.com/martin-steinegger/kraken-protocol/.

Change history

29 August 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41596-024-01064-1

References

Rappé, M. S. & Giovannoni, S. J.The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003).
Article PubMed Google Scholar
Wood, D. E. & Salzberg, S. L.Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Article PubMed PubMed Central Google Scholar
Breitwieser, F. P., Baker, D. N. & Salzberg, S. L.KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L.Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
Article Google Scholar
Breitwieser, P. & Salzberg, S. L.Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36, 1303–1304 (2020).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L.Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Taur, Y. et al.Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. Sci. Transl. Med. 10, eaap9489 (2018).
Article PubMed PubMed Central Google Scholar
Li, Z. et al.Identifying corneal infections in formalin-fixed specimens using next generation sequencing. Invest. Ophthalmol. Vis. Sci. 59(Jan), 280–288 (2018).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J.Basic local alignment search tool. J. Mol. Biol. 215(Oct), 403–410 (1990).
Article CAS PubMed Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R.NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Article CAS PubMed Google Scholar
O’Leary, N. A. et al.Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article PubMed Google Scholar
Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S.CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015).
Article PubMed PubMed Central Google Scholar
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L.Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
Article CAS PubMed PubMed Central Google Scholar
Menzel, P., Ng, K. L. & Krogh, A.Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C.Benchmarking metagenomics tools for taxonomic classification. Cell 178, 779–794 (2019).
Article CAS PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, M.LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res. 30, 1208–1216 (2020).
Article CAS PubMed PubMed Central Google Scholar
Segata, N. et al.Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J. B. & Vert, J. P.Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32, 1023–1032 (2016).
Article CAS PubMed Google Scholar
Luo, Y., Yu, Y. W., Zeng, J., Berger, B. & Peng, J.Metagenomic binning through low-density hashing. Bioinformatics 35, 219–226 (2019).
Article CAS PubMed Google Scholar
Breitwieser, F. P., Lu, J. & Salzberg, S. L.A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2017).
Article PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).
Li, H.Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stephens, Z. et al.Exogene: a performant workflow for detecting viral integrations from paired-end next-generation sequencing data. PLoS ONE 16, e0250915 (2021).
Article CAS PubMed PubMed Central Google Scholar
Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L.Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019).
Article CAS PubMed PubMed Central Google Scholar
Steinegger, M. & Salzberg, S. L.Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lu, J. & Salzberg, S. L.Removing contaminants from databases of draft genomes. PLoS Comput. Biol. 14, e1006277 (2018).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H.Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J. & Levy Karin, E. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029–3031 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nasko, D. J., Koren, S., Phillippy, A. M. & Treangen, T. J.RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19, 165 (2018).
Article PubMed PubMed Central Google Scholar
Yang, C. et al.A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput. Struct. Biotechnol. J. 19, 6301–6314 (2021).
Article CAS PubMed PubMed Central Google Scholar
Whittaker, R. H.Evolution and measurement of species diversity. Taxon 21, 213–251 (1972).
Article Google Scholar
Berger, W. H. & Parker, F. L. Diversity of planktonic foraminifera in deep-sea sediments. Science 168, 1345–1347 (1970).
Article CAS PubMed Google Scholar
Fisher, R. A., Corbet, A. S. & Williams, C. B.The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 12, 42–58 (1943).
Article Google Scholar
Simpson, E. H.Measurement of diversity. Nature 163, 688–688 (1949).
Article Google Scholar
Shannon, C. E.A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Article Google Scholar
Bray, J. R. & Curtis, J. T.An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957).
Article Google Scholar
Ondov, B. D., Bergman, N. H. & Phillippy, A. M.Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 385 (2011).
Article Google Scholar
Danecek, P. et al.Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Grüning, B. et al.Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Indexes for tools in the Kraken suite, including the indexes used in this protocol, are made freely available on Amazon Web Services thanks to the AWS Public Dataset Program. B.L. was supported by NIH/NIHMS grant R35GM139602. S.L.S. was supported by NIH grants R35-GM130151 and R01-HG006677. M.S. acknowledges support from the National Research Foundation of Korea grant (2019R1A6A1A10073437, 2020M3A9G7103933, 2021R1C1C102065 and 2021M3A9I4021220); New Faculty Startup Fund; and the Creative-Pioneering Researchers Program through Seoul National University.

Author information

These authors contributed equally: Jennifer Lu, Natalia Rincon.

Authors and Affiliations

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Jennifer Lu, Natalia Rincon & Steven L. Salzberg
Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
Jennifer Lu, Natalia Rincon, Derrick E. Wood, Florian P. Breitwieser, Christopher Pockrandt & Steven L. Salzberg
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Derrick E. Wood, Ben Langmead & Steven L. Salzberg
Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
Steven L. Salzberg
School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
Martin Steinegger

Authors

Jennifer Lu
View author publications
You can also search for this author inPubMed Google Scholar
Natalia Rincon
View author publications
You can also search for this author inPubMed Google Scholar
Derrick E. Wood
View author publications
You can also search for this author inPubMed Google Scholar
Florian P. Breitwieser
View author publications
You can also search for this author inPubMed Google Scholar
Christopher Pockrandt
View author publications
You can also search for this author inPubMed Google Scholar
Ben Langmead
View author publications
You can also search for this author inPubMed Google Scholar
Steven L. Salzberg
View author publications
You can also search for this author inPubMed Google Scholar
Martin Steinegger
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

J.L. and M.S. led the development of the protocol. N.R. executed and designed the microbiome analysis protocol and is the author of the KrakenTools α-diversity tools. J.L. developed the pathogen identification protocol and is the author of Bracken and KrakenTools. M.S. authored the Jupyter notebooks for the protocol. D.E.W. is the senior author of Kraken and Kraken 2. F.B. is the author of KrakenUniq. C.P. is an author for the KrakenTools β-diversity script. B.L. supervised the development of Kraken 2. S.L.S. supervised the development of Kraken, KrakenUniq and Bracken. B.L. and S.L.S. supervised the development of this protocol. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Jennifer Lu or Martin Steinegger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table 1

Supplementary Table 2

Source data

Source Data Fig. 2

Breport text for plotting Sankey, and krona counts for plotting krona plots.

Source Data Fig. 6

Alpha diversity table text, bray Curtis equation text, and heatmap values for beta diversity.

Source Data Fig. 7

Pathogen sample species heat map data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y

Download citation

Received: 29 June 2021
Accepted: 16 June 2022
Published: 28 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s41596-022-00738-y

Metagenome analysis using the Kraken software suite

Subjects

Abstract

Access options

Similar content being viewed by others

Benchmarking second and third-generation sequencing platforms for microbial metagenomics

Critical Assessment of Metagenome Interpretation: the second round of challenges

Unveiling microbial diversity: harnessing long-read sequencing technology

Data availability

Code availability

Change history

29 August 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Table 1

Supplementary Table 2

Source data

Source Data Fig. 2

Source Data Fig. 6

Source Data Fig. 7

Rights and permissions

About this article

Cite this article

Kraken: ultrafast metagenomic sequence classification using exact alignments

KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

Improved metagenomic analysis with Kraken 2

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

Change history

29 August 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links