Novel taxonomic profiling and benchmarking methods for high-resolution metagenomics

Fritscher, Joachim (2024) Novel taxonomic profiling and benchmarking methods for high-resolution metagenomics. Doctoral thesis, University of East Anglia.

[thumbnail of 2025FritscherJPhD.pdf]
Preview
PDF
Download (15MB) | Preview

Abstract

The ever-increasing scale of available metagenomic data demands both fast and accurate tools. This comes at a time when assembly-based metagenomics substantially increased represented bacterial diversity in taxonomic databases and holds great potential for accurate and fast taxonomic profilers. Yet, current metagenomic profilers and strain-resolved tools do rarely utilise the vast known taxonomic diversity, nor do they leverage state-of-the-art approaches to provide both efficient and accurate metagenomic taxa profiles. With the importance of both species and strain-level analysis for gaining insights into the workings of microbial communities, there is a need to improve our understanding of how current tools work, and to improve the integration of available genetic resources using standardized and community supported databases.

This thesis presents approaches towards understanding the limitations of species- and strain-resolved metagenomic analysis and introduces two newly developed metagenomic profilers. Firstly, benchpro is a tool for in-depth benchmarking of taxonomic profilers using synthetic metagenomes. Benchpro disentangles the signal of false predictions by introducing a shared phylogenetic context between gold-standard profile and prediction. Second, varkit is a fast k-mer based taxonomic profiler that detects de novo SNPs based on k-mer match patterns relative to a taxonomic database. Lastly, protal is an alignment-based approach to taxonomic profiling and strain-resolved analysis and demonstrates how the integration of k-mer and alignment-based concepts can elevate accuracy and efficiency. Protal provides sensitive and accurate analysis at species-level while being 6.5 times faster than similar profilers. At strain-level, protal builds on top of a custom alignment algorithm and leverages this in a reference-guided multiple sequence alignment algorithm, achieving speed-ups of up to 55-fold.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Biological Sciences
Depositing User: Chris White
Date Deposited: 07 Jul 2025 10:27
Last Modified: 07 Jul 2025 10:27
URI: https://ueaeprints.uea.ac.uk/id/eprint/99844
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item