Deep learning and (meta)genomics to characterise ice-binding proteins and predict population extinction risk

Winder, Johanna (2025) Deep learning and (meta)genomics to characterise ice-binding proteins and predict population extinction risk. Doctoral thesis, University of East Anglia.

[thumbnail of jw_thesis_corrected_9Dec25.pdf] PDF
Restricted to Repository staff only until 31 October 2028.

Request a copy

Abstract

Genomes reflect the evolutionary history and adaptive potential of organisms. Understanding how genetic diversity permits adaptation to extreme and changing environments will guide efforts to preserve it. Despite their harshness, frozen environments harbour diverse microbiota. Microorganisms produce ice-binding proteins (IBPs) with the domain of unknown function 3494 (DUF3494) as an adaptation to freezing, allowing them to maintain liquid habitats and prevent cellular damage from ice. IBPs display a range of functions in addition to freezing point depression. This thesis explores the diversity of microbial IBPs in metagenomes and individual genomes, and develops a deep-learning model to identify environment-specific differences between proteins. This method of classifying genomic data with machine learning is then used to predict long-term extinction risk of populations from genomic features only.

Metagenomes and metagenome-assembled genomes from the central Arctic Ocean were used to investigate taxonomy, predicted protein structures, domain architectures and synteny of IBPs. We expanded the previously known complement of IBPs by an order of magnitude, revealing a novel extended DUF3494 structure, and possible taxon-specific domain shuffling. We then built an interpretable deep learning model to classify IBPs across frozen environments. We investigated 50,669 IBPs from glacier ice, rock, subsurface, frozen sediment and polar marine environments, classifying 75.9%-97.8% correctly and revealing the protein structural features which drove these differences. We then investigated IBPs in single genomes, comparing IBPs from 10 strains of the cold-adapted diatom Fragilariopsis cylindrus. This revealed that following acquisition via horizontal gene transfer and subsequent duplications, while most IBPs are under purifying selection in the genome, some show signs of divergence. Finally, we tested machine learning models that could predict extinction risk of simulated populations from their genomic features. Models learned genomic indicators of demography to make predictions, and including life-history traits such as fecundity improved model performance.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Environmental Sciences
Depositing User: Chris White
Date Deposited: 27 Jan 2026 13:48
Last Modified: 27 Jan 2026 13:48
URI: https://ueaeprints.uea.ac.uk/id/eprint/101745
DOI:

Actions (login required)

View Item View Item