Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

Peona, Valentina, Blom, Mozes P. K., Xu, Luohao, Burri, Reto, Sullivan, Shawn, Bunikis, Ignas, Liachko, Ivan, Haryoko, Tri, Jønsson, Knud A., Zhou, Qi, Irestedt, Martin and Suh, Alexander ORCID: https://orcid.org/0000-0002-8979-9992 (2021) Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Molecular Ecology Resources, 21 (1). pp. 263-286. ISSN 1755-098X

[thumbnail of Published_Version]
Preview
PDF (Published_Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.

Item Type: Article
Uncontrolled Keywords: gc content,hi-c,chromosome-level assembly,genome assembly,long reads,satellite repeat,transposable element,biotechnology,ecology, evolution, behavior and systematics,genetics ,/dk/atira/pure/subjectarea/asjc/1300/1305
Faculty \ School: Faculty of Science > School of Biological Sciences
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 02 Oct 2020 23:56
Last Modified: 18 Oct 2024 23:54
URI: https://ueaeprints.uea.ac.uk/id/eprint/77099
DOI: 10.1111/1755-0998.13252

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item