Decoding the regulatory regions of crop genomes using bioinformatics and machine learning

Mahony, Lucy (2025) Decoding the regulatory regions of crop genomes using bioinformatics and machine learning. Doctoral thesis, University of East Anglia.

[thumbnail of 2025MahonyLPhD.pdf] PDF
Restricted to Repository staff only until 31 May 2027.

Request a copy

Abstract

The regulation of gene expression is complex, influenced by a wide range of pre- and posttranscriptional processes, encoded in an array of locations throughout the genome. Our current understanding of gene regulation is incomplete, and comprehensively inferring function from sequence is still beyond current capabilities. Regulatory regions have potential agronomic utility as locations of genetic variation to modulate the expression of traits of interest in crop breeding and could be used to keep up with the global demand for wheat under the pressures of a rising population and a changing climate. This thesis aims to decode regulatory regions in crop genomes. Firstly, by global analysis of the transcriptional start sites landscape in wheat. This dataset led to the discovery of de novo putative cis-regulatory elements (CREs) in the wheat core promoter, which correlated with tissue specificity. In parallel, machine learning was employed to decode regulatory regions using promoter and gene body sequences to predict expression patterns, followed by model interpretation techniques to pinpoint the most predictive sequence features and thereby identify putative CREs. Classical k-mer based machine learning (ML) approaches were benchmarked against neural network model architectures, alternative feature input encoding methods, and foundation genomic large language models (gLLMs). Although classical k-mer based ML approaches weren’t surpassed, gLLMs provide a mechanism to identify putative regulator regions of varied length with positional information. This was used to identify putative CREs. The identified CREs included known transcription factor binding sites as well as novel motifs, particularly homopolymeric (dA-dT) sequences. Together, the work presented in this thesis advances the current understanding of wheat and soybean CREs, furthering knowledge of gene regulation as well as contributing to our understanding of techniques with which regulatory regions can be studied.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Biological Sciences
Depositing User: Chris White
Date Deposited: 18 May 2026 10:08
Last Modified: 18 May 2026 10:08
URI: https://ueaeprints.uea.ac.uk/id/eprint/103061
DOI:

Actions (login required)

View Item View Item