Wheat haplotype diversity by a k-mer based approach

Quiroz Chavez, Jesus (2022) Wheat haplotype diversity by a k-mer based approach. Doctoral thesis, University of East Anglia.

[thumbnail of 2023Quiroz-ChávezJPhD.pdf]
Preview
PDF
Download (26MB) | Preview

Abstract

Wheat is the second most widely cultivated crop, and it is a staple food across the globe. The hexaploid form has the largest, polyploid, complex, and highly repetitive genome. Due to this complexity and size, wheat lagged in genomic studies. With advances in NGS genomics progress substantially in daily basis for many crops, including bread wheat. We now face the challenge on how to better exploit these resources for breeding to benefit food security. The main objective of this work was to develop a method to define haplotypes and a database in wheat to explore the genetic diversity in landraces and modern cultivars and link genome information with phenotypes. We embraced the challenge of using whole genome sequencing at ~12-fold coverage of more than >1,000 WGS genomes.

We developed IBSpy, a method to detect genetic variations using raw reads by k-mers. We benchmarked this method with previous genome alignments to detect regions which are identical by state (>99.99% sequence identity). We characterized parameters that impact in the results and provide further guidance to implement at specific situations. Our method detects variations at the resolution as with fully genome assemblies and condenses multiple types of sequences and types of variations into a single form.

Using these variations, we defined haplotypes at 1 Mbp resolution by a multi-genome approach and built a haplotype database using the >1,000 genotypes. We tracked haplotypes from landraces into modern cultivars and found that large haplotype blocks were brought into modern cultivars from landraces and are maintained through >80 years of breeding. Using these haplotypes, we conducted a haplotype GWAS, and detected genome regions associated to disease (wheat blast and yellow rust) and spike related traits. Novel unexploited haplotypes were identified in landraces absent in modern cultivars. This method integrates pangenome informed haplotypes to capture genome regions private to each assembly and can handle large WGS data.

We proved IBSpy to efficiently detect known and novel hybridisations/introgressions in the wheat pangenome and landraces at 50 Kbp resolution. We characterized a collection of Triticum monococcum, Aegilops tauschii, and large introgressions from multiple wild relatives and propose candidate genotypes to be the closest donors of those hybridisations/introgressions. Using these haplotypes, we identified novel hybridisations of Ae. tauschii in the D subgenome of wheat absents in the pangenome references. These results demonstrated the utility of our haplotype calls using an alternative approach to the conventional aliments methods. We created a flexible and wide haplotype database based on k-mers to which novel 12-fold WGS genotypes can be added and easily integrated in the context to this haplotype database.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Biological Sciences
Depositing User: Chris White
Date Deposited: 26 Oct 2023 12:37
Last Modified: 31 Jul 2024 01:38
URI: https://ueaeprints.uea.ac.uk/id/eprint/93480
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item