Computational approaches for advancing our understanding of marine microbes

Duncan, Anthony (2023) Computational approaches for advancing our understanding of marine microbes. Doctoral thesis, University of East Anglia.

[thumbnail of AD 230525 Final PhD Thesis.pdf] PDF
Restricted to Repository staff only until 30 November 2023.

Request a copy

Abstract

Ocean microbes are essential for marine life, forming the base of the ocean food web and contributing to biogeochemical cycling of essential nutrients. The advent of modern molecular genetics techniques has revealed a large degree of diversity among these microbial communities, and metagenomic sequencing allows insight into the total metabolic potential and phylogeny of its constituent organisms. In this thesis, we develop two new computational approaches to analysing metagenomic sequencing data in order to advance our understanding
of marine microbes.

Many marine microbes cannot be grown under lab conditions, but methods to obtain metagenome assembled genomes (MAGs) from metagenomic data have been widely applied for prokaryotes. However, many of the most abundant and environmentally significant microbes are eukaryotic, for which few MAGs have been recovered. To address this gap, we designed and implemented a pipeline for automated recovery of eukaryotic MAGs. From 12 samples, we obtained 21 MAGs from lineages including diatoms and prasinophytes. Our analysis of these eukaryotes, alongside prokaryotes from the same samples, showed a demarcation between polar and non-polar communities. The highest quality MAG has been included in algal genomics resource PhycoCosm as Micromonas sp. AD1.

We also want to understand the functional capability of the whole microbial community, as well as individual organisms. Functions are known to be shared between organisms and pathways, so we developed an unsupervised machine learning approach using the Non-Negative Matrix Factorisation (NMF) decomposition method to identify modules of functions which reflect this expected sharing of functions. Interpreting the resulting decomposition
is important for exploratory analysis, and we developed the Leave-One-Out Correlation Decrease (LOOCD) method for this task with good performance identifying shared functions. Our methods successfully recover modules in simulated sequencing data and in real world cases studies, both identifying established groups (e.g. surface and mesopelagic ocean) and having meaningful biological interpretation.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Nicola Veasy
Date Deposited: 22 Jun 2023 10:20
Last Modified: 22 Jun 2023 10:20
URI: https://ueaeprints.uea.ac.uk/id/eprint/92469
DOI:

Actions (login required)

View Item View Item