Abstract
Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1–2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
Data availability
The microbiome analysis used three samples from Taur et al.8, and the pathogen identification used ten samples from Li et al.9, all of which can be found on NCBI with their SRA IDs. Source data are provided with this paper.
Code availability
The following website details and links all software and databases used in this protocol: http://ccb.jhu.edu/data/kraken2_protocol/. We also provide easy-to-use Jupyter notebooks for both workflows, which can be executed in the browser using Google Collab: https://github.com/martin-steinegger/kraken-protocol/.
Change history
29 August 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41596-024-01064-1
References
Rappé, M. S. & Giovannoni, S. J.The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003).
Wood, D. E. & Salzberg, S. L.Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Breitwieser, F. P., Baker, D. N. & Salzberg, S. L.KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L.Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
Breitwieser, P. & Salzberg, S. L.Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36, 1303–1304 (2020).
Langmead, B. & Salzberg, S. L.Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Taur, Y. et al.Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. Sci. Transl. Med. 10, eaap9489 (2018).
Li, Z. et al.Identifying corneal infections in formalin-fixed specimens using next generation sequencing. Invest. Ophthalmol. Vis. Sci. 59(Jan), 280–288 (2018).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J.Basic local alignment search tool. J. Mol. Biol. 215(Oct), 403–410 (1990).
Pruitt, K. D., Tatusova, T. & Maglott, D. R.NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
O’Leary, N. A. et al.Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S.CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015).
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L.Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
Menzel, P., Ng, K. L. & Krogh, A.Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C.Benchmarking metagenomics tools for taxonomic classification. Cell 178, 779–794 (2019).
Seppey, M., Manni, M. & Zdobnov, M.LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res. 30, 1208–1216 (2020).
Segata, N. et al.Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J. B. & Vert, J. P.Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32, 1023–1032 (2016).
Luo, Y., Yu, Y. W., Zeng, J., Berger, B. & Peng, J.Metagenomic binning through low-density hashing. Bioinformatics 35, 219–226 (2019).
Breitwieser, F. P., Lu, J. & Salzberg, S. L.A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2017).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).
Li, H.Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Stephens, Z. et al.Exogene: a performant workflow for detecting viral integrations from paired-end next-generation sequencing data. PLoS ONE 16, e0250915 (2021).
Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L.Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019).
Steinegger, M. & Salzberg, S. L.Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020).
Lu, J. & Salzberg, S. L.Removing contaminants from databases of draft genomes. PLoS Comput. Biol. 14, e1006277 (2018).
Buchfink, B., Xie, C. & Huson, D. H.Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J. & Levy Karin, E. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029–3031 (2021).
Nasko, D. J., Koren, S., Phillippy, A. M. & Treangen, T. J.RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19, 165 (2018).
Yang, C. et al.A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput. Struct. Biotechnol. J. 19, 6301–6314 (2021).
Whittaker, R. H.Evolution and measurement of species diversity. Taxon 21, 213–251 (1972).
Berger, W. H. & Parker, F. L. Diversity of planktonic foraminifera in deep-sea sediments. Science 168, 1345–1347 (1970).
Fisher, R. A., Corbet, A. S. & Williams, C. B.The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 12, 42–58 (1943).
Simpson, E. H.Measurement of diversity. Nature 163, 688–688 (1949).
Shannon, C. E.A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Bray, J. R. & Curtis, J. T.An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M.Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 385 (2011).
Danecek, P. et al.Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Grüning, B. et al.Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).
Acknowledgements
Indexes for tools in the Kraken suite, including the indexes used in this protocol, are made freely available on Amazon Web Services thanks to the AWS Public Dataset Program. B.L. was supported by NIH/NIHMS grant R35GM139602. S.L.S. was supported by NIH grants R35-GM130151 and R01-HG006677. M.S. acknowledges support from the National Research Foundation of Korea grant (2019R1A6A1A10073437, 2020M3A9G7103933, 2021R1C1C102065 and 2021M3A9I4021220); New Faculty Startup Fund; and the Creative-Pioneering Researchers Program through Seoul National University.
Author information
Authors and Affiliations
Contributions
J.L. and M.S. led the development of the protocol. N.R. executed and designed the microbiome analysis protocol and is the author of the KrakenTools α-diversity tools. J.L. developed the pathogen identification protocol and is the author of Bracken and KrakenTools. M.S. authored the Jupyter notebooks for the protocol. D.E.W. is the senior author of Kraken and Kraken 2. F.B. is the author of KrakenUniq. C.P. is an author for the KrakenTools β-diversity script. B.L. supervised the development of Kraken 2. S.L.S. supervised the development of Kraken, KrakenUniq and Bracken. B.L. and S.L.S. supervised the development of this protocol. All authors contributed to the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Salzberg, S. et al. Neurol. Neuroimmunol. Neuroinflamm. 3, e251 (2016): https://doi.org/10.1212/NXI.0000000000000251
Wood, D. et al. Genome Biol. 15, R46 (2014): https://doi.org/10.1186/gb-2014-15-3-r46
Lu, J. et al. Peer J. Comput. Sci. 3, e104 (2017): https://doi.org/10.7717/peerj-cs.104
Breitwieser, F. et al. Genome Biol. 19, 198 (2018): https://doi.org/10.1186/s13059-018-1568-0
Wood, D. et al. Genome Biol. 20, 257 (2019): https://doi.org/10.1186/s13059-019-1891-0
Breitwieser, F. et al. Bioinformatics 36, 1303–1304 (2020): https://doi.org/10.1093/bioinformatics/btz715
Key data used in this protocol
Taur, Y. et al. Sci. Transl. Med. 10, eaap9489 (2018): https://doi.org/10.1126/scitranslmed.aap9489
Li, Z. et al. Invest. Ophthalmol. Vis. Sci. 59, 280–288 (2018): https://doi.org/10.1167/iovs.17-21617
Supplementary information
Supplementary Table 1
Supplementary Table 1
Supplementary Table 2
Supplementary Table 2
Source data
Source Data Fig. 2
Breport text for plotting Sankey, and krona counts for plotting krona plots.
Source Data Fig. 6
Alpha diversity table text, bray Curtis equation text, and heatmap values for beta diversity.
Source Data Fig. 7
Pathogen sample species heat map data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-022-00738-y