Keane, Ann-Marie (2020) Computing a Yeast Tree of Life. Doctoral thesis, University of East Anglia.
Preview |
PDF
Download (16MB) | Preview |
Abstract
This study aimed to compare the results of distinct state-of-the-art phylogenetic tree-building methodologies for a key yeast NGS dataset, with the ultimate goal of establishing a yeast tree of life. Draft genome assemblies of seventy-five species from the Saccharomyces complex, a well-studied group of species of academic and industrial importance, first underwent a stringent quality control process, along with a dataset from an outgroup species. This process uncovered a vast amount of genomic information. New, good quality genome assemblies were introduced for six Saccharomyces complex species and for four strains.
Key genomic differences were found in a quality-controlled subset of this dataset including varying genome sizes (8-29Mbp), coding genome proportions (54-77%) and number of genes (4,131-11,243). The total GC content was also found to vary significantly across the dataset, ranging from 31.7% in a Tetrapisispora blattae strain to 52% in a representative of Torulaspora globosa. The core genome of forty Saccharomyces complex species was also identified in this study and it was found that 591 genes with _50% amino-acid sequence identity were present across all strains.
Phylogenetic trees were then built from the full 76 species dataset, comprising Maximum Likelihood approaches for a seven-region Multi-Locus Sequence Typing and 1,711 BUSCO gene datasets along with three variations of a recently developed NGS alignment-free approach - Feature Frequency Profiles (FFP). The resulting trees were then compared, with all trees found to be different, though with the BUSCO and FFP 20-letter amino acid trees highly superior to the other approaches. Despite the success of the FFP 20-letter amino acid approach for the Saccharomyces complex dataset, simulation studies confirmed a sequence length bias with the FFP two-letter RY alphabet and a GC bias with the FFP four-letter DNA alphabet approaches. In an effort to overcome the biases within the current FFP approach, a new software tool, jellyphy, was developed. Further development of tools such as this will undoubtedly lead to new methods capable of accurate phylogenetic estimation from yeast NGS datasets.
Item Type: | Thesis (Doctoral) |
---|---|
Faculty \ School: | Faculty of Science > School of Biological Sciences |
Depositing User: | Chris White |
Date Deposited: | 19 Oct 2021 07:18 |
Last Modified: | 19 Oct 2021 07:18 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/81781 |
DOI: |
Downloads
Downloads per month over past year
Actions (login required)
View Item |