Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding

Somervuo, Panu, Yu, Douglas W., Xu, Charles C.Y., Ji, Yinqiu, Hultman, Jenni, Wirta, Helena and Ovaskainen, Otso (2017) Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding. Methods in Ecology and Evolution, 8. pp. 398-407. ISSN 2041-210X

[img]
Preview
PDF (Accepted manuscript) - Submitted Version
Download (955kB) | Preview

Abstract

A crucial step in the use of DNA markers for biodiversity surveys is the assignment of Linnaean taxonomies (species, genus, etc.) to sequence reads. This allows the use of all the information known based on the taxonomic names. Taxonomic placement of DNA barcoding sequences is inherently probabilistic because DNA sequences contain errors, because there is natural variation among sequences within a species, and because reference data bases are incomplete and can have false annotations. However, most existing bioinformatics methods for taxonomic placement either exclude uncertainty, or quantify it using metrics other than probability. In this paper we evaluate the performance of the recently proposed probabilistic taxonomic placement method PROTAX by applying it to both annotated reference sequence data as well as to unknown environmental data. Our four case studies include contrasting taxonomic groups (fungi, bacteria, mammals and insects), variation in the length and quality of the barcoding sequences (from individually Sanger-sequenced sequences to short Illumina reads), variation in the structures and sizes of the taxonomies (800–130 000 species) and variation in the completeness of the reference data bases (representing 15–100% of known species). Our results demonstrate that PROTAX yields essentially unbiased probabilities of taxonomic placement, which means its quantification of species identification uncertainty is reliable. As expected, the accuracy of taxonomic placement increases with increasing coverage of taxonomic and reference sequence data bases, and with increasing ratio of genetic variation among taxonomic levels over within taxonomic levels. We conclude that reliable species-level identification from environmental samples is still challenging and that neglecting identification uncertainty can lead to spurious inference. A key aim for future research is the completion of taxonomic and reference sequence data bases and making these two types of data compatible.

Item Type: Article
Additional Information: Special Feature: Technological Advances at the Interface between Ecology and Statistics
Uncontrolled Keywords: dna barcoding,dna metabarcoding,molecular species identification,multinomial regression,statistical model,taxonomic assignment,taxonomic placement
Faculty \ School: Faculty of Science > School of Biological Sciences
Related URLs:
Depositing User: Pure Connector
Date Deposited: 13 Apr 2017 05:09
Last Modified: 17 Mar 2020 23:15
URI: https://ueaeprints.uea.ac.uk/id/eprint/63250
DOI: 10.1111/mee3.2017.8.issue-4

Actions (login required)

View Item View Item