Computing a Yeast Tree of Life

Keane, Ann-Marie (2020) Computing a Yeast Tree of Life. Doctoral thesis, University of East Anglia.

[thumbnail of 2020KeaneA-MPhD.pdf]
Preview
PDF
Download (16MB) | Preview

Abstract

This study aimed to compare the results of distinct state-of-the-art phylogenetic tree-building methodologies for a key yeast NGS dataset, with the ultimate goal of establishing a yeast tree of life. Draft genome assemblies of seventy-five species from the Saccharomyces complex, a well-studied group of species of academic and industrial importance, first underwent a stringent quality control process, along with a dataset from an outgroup species. This process uncovered a vast amount of genomic information. New, good quality genome assemblies were introduced for six Saccharomyces complex species and for four strains.

Key genomic differences were found in a quality-controlled subset of this dataset including varying genome sizes (8-29Mbp), coding genome proportions (54-77%) and number of genes (4,131-11,243). The total GC content was also found to vary significantly across the dataset, ranging from 31.7% in a Tetrapisispora blattae strain to 52% in a representative of Torulaspora globosa. The core genome of forty Saccharomyces complex species was also identified in this study and it was found that 591 genes with _50% amino-acid sequence identity were present across all strains.

Phylogenetic trees were then built from the full 76 species dataset, comprising Maximum Likelihood approaches for a seven-region Multi-Locus Sequence Typing and 1,711 BUSCO gene datasets along with three variations of a recently developed NGS alignment-free approach - Feature Frequency Profiles (FFP). The resulting trees were then compared, with all trees found to be different, though with the BUSCO and FFP 20-letter amino acid trees highly superior to the other approaches. Despite the success of the FFP 20-letter amino acid approach for the Saccharomyces complex dataset, simulation studies confirmed a sequence length bias with the FFP two-letter RY alphabet and a GC bias with the FFP four-letter DNA alphabet approaches. In an effort to overcome the biases within the current FFP approach, a new software tool, jellyphy, was developed. Further development of tools such as this will undoubtedly lead to new methods capable of accurate phylogenetic estimation from yeast NGS datasets.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Biological Sciences
Depositing User: Chris White
Date Deposited: 19 Oct 2021 07:18
Last Modified: 19 Oct 2021 07:18
URI: https://ueaeprints.uea.ac.uk/id/eprint/81781
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item