Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression

Veevers, Ruth, Cawley, Gavin ORCID: https://orcid.org/0000-0002-4118-9095 and Hayward, Steven ORCID: https://orcid.org/0000-0001-6959-2604 (2020) Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression. BMC Bioinformatics, 21. ISSN 1471-2105

[thumbnail of Accepted version]
Preview
PDF (Accepted version) - Accepted Version
Download (1MB) | Preview
[thumbnail of Published_Version]
Preview
PDF (Published_Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Background: Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. Results: The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. Conclusion: In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk.

Item Type: Article
Uncontrolled Keywords: domain closure,hinge axis,linker region,protein conformational change,structural biology,biochemistry,molecular biology,computer science applications,applied mathematics ,/dk/atira/pure/subjectarea/asjc/1300/1315
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Centre for Ocean and Atmospheric Sciences
Faculty of Science > Research Groups > Data Science and Statistics
Faculty of Science > Research Groups > Computational Biology
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 24 Mar 2020 01:52
Last Modified: 21 Apr 2023 00:27
URI: https://ueaeprints.uea.ac.uk/id/eprint/74605
DOI: 10.1186/s12859-020-3464-3

Actions (login required)

View Item View Item