TY - JOUR
T1 - Lineage-specific microbial protein prediction enables large-scale exploration of protein ecology within the human gut
AU - Schmitz, Matthias A.
AU - Dimonaco, Nicholas J.
AU - Clavel, Thomas
AU - Hitch, Thomas C. A.
N1 - © 2025. The Author(s).
PY - 2025/4/3
Y1 - 2025/4/3
N2 - Microbes use a range of genetic codes and gene structures, yet these are often ignored during metagenomic analysis. This causes spurious protein predictions, preventing functional assignment which limits our understanding of ecosystems. To resolve this, we developed a lineage-specific gene prediction approach that uses the correct genetic code based on the taxonomic assignment of genetic fragments, removes incomplete protein predictions, and optimises prediction of small proteins. Applied to 9634 metagenomes and 3594 genomes from the human gut, this approach increased the landscape of captured expressed microbial proteins by 78.9 including previously hidden functional groups. Optimised small protein prediction captured 3,772,658 small protein clusters, which form an improved microbial protein catalogue of the human gut (MiProGut). To enable the ecological study of a protein's prevalence and association with host parameters, we developed InvestiGUT, a tool which integrates both the protein sequences and sample metadata. Accurate prediction of proteins is critical to providing a functional understanding of microbiomes, enhancing our ability to study interactions between microbes and hosts.
AB - Microbes use a range of genetic codes and gene structures, yet these are often ignored during metagenomic analysis. This causes spurious protein predictions, preventing functional assignment which limits our understanding of ecosystems. To resolve this, we developed a lineage-specific gene prediction approach that uses the correct genetic code based on the taxonomic assignment of genetic fragments, removes incomplete protein predictions, and optimises prediction of small proteins. Applied to 9634 metagenomes and 3594 genomes from the human gut, this approach increased the landscape of captured expressed microbial proteins by 78.9 including previously hidden functional groups. Optimised small protein prediction captured 3,772,658 small protein clusters, which form an improved microbial protein catalogue of the human gut (MiProGut). To enable the ecological study of a protein's prevalence and association with host parameters, we developed InvestiGUT, a tool which integrates both the protein sequences and sample metadata. Accurate prediction of proteins is critical to providing a functional understanding of microbiomes, enhancing our ability to study interactions between microbes and hosts.
UR - https://www.scopus.com/pages/publications/105002836173
U2 - 10.1038/s41467-025-58442-w
DO - 10.1038/s41467-025-58442-w
M3 - Article
C2 - 40180917
SN - 2041-1723
VL - 16
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 3204
ER -