Grant, Alastair, Aleidan, Abdullah, Davies, Charli S., Udochi, Solomon C., Fritscher, Joachim, Bahram, Mohammad and Hildebrand, Falk (2025) KSGP 3.1: improved taxonomic annotation of Archaea communities using LotuS2, the genome taxonomy database and RNAseq data. ISME Communications, 5 (1). ISSN 2730-6151
Preview |
PDF (ycaf094)
- Published Version
Available under License Creative Commons Attribution. Download (3MB) | Preview |
Abstract
Taxonomic annotation is a substantial challenge for Archaea metabarcoding. A limited number of reference sequences are available; a substantial fraction of phylogenetic diversity is not fully characterized; widely used databases do not reflect current archaeal taxonomy and contain mislabelled sequences. We address these gaps with a systematic and tractable approach based around the Genome Taxonomy Database (GTDB) combined with the eukaryote PR2 and MIDORI mitochondrial databases. After removing incongruent, chimeric and duplicate SSU sequences, this combination (GTDB+) provides a small improvement in annotation of a set of estuarine Archaea Operational Taxonomic Units (OTUs) compared to SILVA. We add to this a collection of near full length rRNA sequences and the prokaryote SSU sequences in SILVA, creating a new reference database, KSGP (Karst, Silva, GTDB, and PR2). The additional sequences are (re-)annotated using three different approaches. The most conservative, using lowest common ancestor, gives a further small improvement. Annotation using SINTAX increases Class and Order assignments by 2.7 and 4.2 times over SILVA, although this may include some “lumping” of un-named and named clades. Still further improvement can be made using similarity based clustering to group database sequences into putative taxa at all taxonomic levels, assigning 60% and 41% of Archaea OTUs to putative family and genus level taxa respectively. GTDB without cleaning and GreenGenes2 both perform poorly and cannot be recommended for use with Archaea. We make the GTDB+ and KSGP databases available at ksgp.earlham.ac.uk; integrate them into a metabarcoding pipeline, LotuS2 and outline their use to annotate Archaea OTUs and metatranscriptomic data.
Item Type: | Article |
---|---|
Additional Information: | Data accessibility: The GTDB and KSGP database, including the KSGP LCA and KSGP+ annotations are available at ksgp.earlham.ac.uk. Unprocessed DNA sequences for the Archaea are available as ENA accession PRJEB65254. Sequences of the Archaea and Bacteria OTUs; the raw RNA sequencing data and sequences removed from source databases during the construction of KSGP 3.1 are available at https://github.com/AGrantUEA/KSGP. AA was supported by King Saud University. SCU was supported by the Commonwealth Scholarship Commission. MB was supported by the Swedish Research Council (Vetenskapsrådet; Grant 2021–03724). JF was supported by the UKRI Biotechnology and Biological Sciences Research Council Norwich Research Park Biosciences Doctoral Training Partnership, BB/T008717/1. FH was supported by European Research Council H2020 StG (erc-stg-948 219, EPYC) and by the Biotechnology and Biological Sciences Research Council (BBSRC) Institute Strategic Programme (ISP) Food Microbiome and Health BB/X011054/1 and its constituent project BBS/E/F/000PR13631; Earlham ISP BBX011089/1 and its constituent work package BBS/E/ER/230002A. |
Faculty \ School: | Faculty of Science > School of Environmental Sciences University of East Anglia Research Groups/Centres > Theme - ClimateUEA Faculty of Science > School of Biological Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Centre for Ocean and Atmospheric Sciences Faculty of Science > Research Groups > Environmental Biology |
Related URLs: | |
Depositing User: | LivePure Connector |
Date Deposited: | 30 Jun 2025 11:30 |
Last Modified: | 30 Jun 2025 11:30 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/99778 |
DOI: | 10.1093/ismeco/ycaf094 |
Downloads
Downloads per month over past year
Actions (login required)
![]() |
View Item |