Computational approaches for advancing our understanding of marine microbes

Duncan, Anthony (2023) Computational approaches for advancing our understanding of marine microbes. Doctoral thesis, University of East Anglia.

[thumbnail of AD 230525 Final PhD Thesis.pdf]
Download (22MB) | Preview


Ocean microbes are essential for marine life, forming the base of the ocean food web and contributing to biogeochemical cycling of essential nutrients. The advent of modern molecular genetics techniques has revealed a large degree of diversity among these microbial communities, and metagenomic sequencing allows insight into the total metabolic potential and phylogeny of its constituent organisms. In this thesis, we develop two new computational approaches to analysing metagenomic sequencing data in order to advance our understanding
of marine microbes.

Many marine microbes cannot be grown under lab conditions, but methods to obtain metagenome assembled genomes (MAGs) from metagenomic data have been widely applied for prokaryotes. However, many of the most abundant and environmentally significant microbes are eukaryotic, for which few MAGs have been recovered. To address this gap, we designed and implemented a pipeline for automated recovery of eukaryotic MAGs. From 12 samples, we obtained 21 MAGs from lineages including diatoms and prasinophytes. Our analysis of these eukaryotes, alongside prokaryotes from the same samples, showed a demarcation between polar and non-polar communities. The highest quality MAG has been included in algal genomics resource PhycoCosm as Micromonas sp. AD1.

We also want to understand the functional capability of the whole microbial community, as well as individual organisms. Functions are known to be shared between organisms and pathways, so we developed an unsupervised machine learning approach using the Non-Negative Matrix Factorisation (NMF) decomposition method to identify modules of functions which reflect this expected sharing of functions. Interpreting the resulting decomposition
is important for exploratory analysis, and we developed the Leave-One-Out Correlation Decrease (LOOCD) method for this task with good performance identifying shared functions. Our methods successfully recover modules in simulated sequencing data and in real world cases studies, both identifying established groups (e.g. surface and mesopelagic ocean) and having meaningful biological interpretation.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Nicola Veasy
Date Deposited: 22 Jun 2023 10:20
Last Modified: 30 Nov 2023 01:38


Downloads per month over past year

Actions (login required)

View Item View Item