On the use of direct-coupling analysis with a reduced alphabet of amino acids combined with super-secondary structure motifs for protein fold prediction

Bernat Anton, Mireia Besalú, Oriol Fornes, Jaume Bonet, Alexis Molina, Ruben Molina-Fernandez, Gemma De Las Cuevas, Narcis Fernandez-Fuentes, Baldo Oliva*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.

Original languageEnglish
Article numberlqab027
JournalNAR Genomics and Bioinformatics
Volume3
Issue number2
Early online date22 Apr 2021
DOIs
Publication statusPublished - 30 Jun 2021

Fingerprint

Dive into the research topics of 'On the use of direct-coupling analysis with a reduced alphabet of amino acids combined with super-secondary structure motifs for protein fold prediction'. Together they form a unique fingerprint.

Cite this