Computational methods for the analysis of next generation viral sequences

Lamzin, Sergey (2016) Computational methods for the analysis of next generation viral sequences. Doctoral thesis, University of East Anglia.

[img]
Preview
PDF
Download (6MB) | Preview

Abstract

Recent advances in sequencing technologies have brought a renewed impetus to the
development of bioinformatics tools necessary for sequence processing and analysis.
Along with the constant requirement to be able to assemble more complex genomes
from ever evolving sequencing experiments and technologies there also exists a lack in
visually accessible representations of information generated by analysis tools.
Most of the novel algorithms, specifically for de novo genome assembly of next generation
sequencing (NGS) data, are not able to efficiently handle data generated on large
populations. We have assessed the common methods for genome assembly used today
both from a theoretical point of view and their practical implementations.
In this dissertation we present StarK (stands for k�), a novel assembly algorithm with
a new data structure designed to overcome some of the limitations that we observed in
established methods enabling higher quality NGS data processing.
The StarK approach structurally combines de Brujin graphs for all possible dimensions
in one supergraph. Although the technique to join reads remains in concept the same,
the dimension k is no longer fixed. StarK is designed in such a way that it allows the
assembler to dynamically adjust the de Brujin graph dimension k on the fly and at any
given nucleotide position without losing connections between graph vertices or doing
complicated calculations. The new graph uses localised coverage difference evaluation
to create connected sub graphs which allows higher resolution of genomic differences
and helps differentiate errors from potential variants within the sequencing sample.
In addition to this we present a bioinformatics analysis pipeline for high-variation viral
population analysis (including transmission studies), which, using both new and established
methods, creates easily interpretable visual representations of the underlying
data analysis.
Together we provide a solid framework for biologists for extracting more information
from sequencing data with less effort and faster than before.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Biological Sciences
Depositing User: Jackie Webb
Date Deposited: 19 Jul 2016 14:59
Last Modified: 19 Jul 2016 14:59
URI: https://ueaeprints.uea.ac.uk/id/eprint/59666
DOI:

Actions (login required)

View Item View Item