TY - JOUR
T1 - On the use of direct-coupling analysis with a reduced alphabet of amino acids combined with super-secondary structure motifs for protein fold prediction
AU - Anton, Bernat
AU - Besalú, Mireia
AU - Fornes, Oriol
AU - Bonet, Jaume
AU - Molina, Alexis
AU - Molina-Fernandez, Ruben
AU - De Las Cuevas, Gemma
AU - Fernandez-Fuentes, Narcis
AU - Oliva, Baldo
N1 - © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2021/6/30
Y1 - 2021/6/30
N2 - Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.
AB - Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.
UR - http://www.scopus.com/inward/record.url?scp=85123224632&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqab027
DO - 10.1093/nargab/lqab027
M3 - Article
C2 - 33937764
AN - SCOPUS:85123224632
VL - 3
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 2
M1 - lqab027
ER -