Novel Algorithms and Methodology to Help Unravel Secrets that Next Generation Sequencing Data Can Tell

Popescu, Andrei-Alin (2015) Novel Algorithms and Methodology to Help Unravel Secrets that Next Generation Sequencing Data Can Tell. Doctoral thesis, University of East Anglia.

[img]
Preview
PDF
Download (2MB) | Preview

Abstract

The genome of an organism is its complete set of DNA nucleotides, spanning
all of its genes and also of its non-coding regions. It contains most of
the information necessary to build and maintain an organism. It is therefore
no surprise that sequencing the genome provides an invaluable tool for
the scientific study of an organism. Via the inference of an evolutionary
(phylogenetic) tree, DNA sequences can be used to reconstruct the evolutionary
history of a set of species. DNA sequences, or genotype data, has
also proven useful for predicting an organisms’ phenotype (i. e. observed
traits) from its genotype. This is the objective of association studies.
While methods for finding the DNA sequence of an organism have existed
for decades, the recent advent of Next Generation Sequencing (NGS) has
meant that the availability of such data has increased to such an extent
that the computational challenges that now form an integral part of biological
studies can no longer be ignored. By focusing on phylogenetics
and Genome-Wide Association Studies (GWAS), this thesis aims to help
address some of these challenges. As a consequence this thesis is in two
parts with the first one centring on phylogenetics and the second one on
GWAS.
In the first part, we present theoretical insights for reconstructing phylogenetic
trees from incomplete distances. This problem is important in the
context of NGS data as incomplete pairwise distances between organisms
occur frequently with such input and ignoring taxa for which information
is missing can introduce undesirable bias. In the second part we focus on
the problem of inferring population stratification between individuals in a
dataset due to reproductive isolation. While powerful methods for doing
this have been proposed in the literature, they tend to struggle when faced
with the sheer volume of data that comes with NGS. To help address this
problem we introduce the novel PSIKO software and show that it scales
very well when dealing with large NGS datasets.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Mia Reeves
Date Deposited: 04 May 2016 12:02
Last Modified: 04 May 2016 12:02
URI: https://ueaeprints.uea.ac.uk/id/eprint/58571
DOI:

Actions (login required)

View Item View Item