Crossman, Lisa (2019) Punchline: Identifying and comparing significant Pfam protein domain differences across draft whole genome sequences.
Full text not available from this repository. (Request a copy)Abstract
Motivation Short-read draft paired-end Illumina assemblies can be fragmented, contain many contigs and be impacted on by repeat regions, caused by mobile element activity within the genome or inherently repetitive gene structure. Annotating such assemblies for function and analysing gene content can be challenging if predicted genes are fragmented across contigs. Such a case can often occur within specific families of genes such as longer genes with repeating domains, genes specifying several transmembrane domains and of unusual nucleotide content. These genes can often be virulence determinants, therefore losing these specific types of data can seriously impact downstream studies. Results Rather than studying the predicted gene content of draft genomes, we examined predicted protein content using the Pfam domain complements of predicted proteins. We produced a workflow, Punchline, to study the genetic content of draft contig assemblies by looking at the complement of short domains that are unlikely to be affected. We investigated a dataset of Bacteroides ovatus in terms of a grouping involving the vertebrate host from which the organism was isolated and identified potential host restricted functions and host restricted phylogenetic clustering.
Item Type: | Article |
---|---|
Faculty \ School: | Faculty of Science > School of Biological Sciences |
Depositing User: | LivePure Connector |
Date Deposited: | 15 May 2024 13:30 |
Last Modified: | 15 May 2024 13:30 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/95207 |
DOI: | 10.1101/686543 |
Actions (login required)
View Item |