Using 2k + 2 bubble searches to find single nucleotide polymorphisms in k-mer graphs

Younsi, Reda and Maclean, Dan (2015) Using 2k + 2 bubble searches to find single nucleotide polymorphisms in k-mer graphs. Bioinformatics, 31 (5). pp. 642-646. ISSN 1367-4803

[thumbnail of Published_Version]
Preview
PDF (Published_Version) - Published Version
Available under License Creative Commons Attribution.

Download (380kB) | Preview

Abstract

Motivation: Single nucleotide polymorphism (SNP) discovery is an important preliminary for understanding genetic variation. With current sequencing methods, we can sample genomes comprehensively. SNPs are found by aligning sequence reads against longer assembled references. De Bruijn graphs are efficient data structures that can deal with the vast amount of data from modern technologies. Recent work has shown that the topology of these graphs captures enough information to allow the detection and characterization of genetic variants, offering an alternative to alignment-based methods. Such methods rely on depth-first walks of the graph to identify closing bifurcations. These methods are conservative or generate many false-positive results, particularly when traversing highly inter-connected (complex) regions of the graph or in regions of very high coverage. Results: We devised an algorithm that calls SNPs in converted De Bruijn graphs by enumerating 2k + 2 cycles. We evaluated the accuracy of predicted SNPs by comparison with SNP lists from alignment-based methods. We tested accuracy of the SNP calling using sequence data from 16 ecotypes of Arabidopsis thaliana and found that accuracy was high. We found that SNP calling was even across the genome and genomic feature types. Using sequence-based attributes of the graph to train a decision tree allowed us to increase accuracy of SNP calls further. Together these results indicate that our algorithm is capable of finding SNPs accurately in complex sub-graphs and potentially comprehensively from whole genome graphs.

Item Type: Article
Faculty \ School: Faculty of Science > The Sainsbury Laboratory
Faculty of Science > School of Computing Sciences
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 10 Nov 2020 01:17
Last Modified: 22 Oct 2022 07:26
URI: https://ueaeprints.uea.ac.uk/id/eprint/77609
DOI: 10.1093/bioinformatics/btu706

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item