Fast structural search for classification of gut bacterial mucin O-glycan degrading enzymes
Article 2026
Authors
ME
Mert Erden
TS
Tyler J. Schult
KY
Karin Yanagi
Abstract
2 min read
Abstract The Enzyme Commission (EC) numbering scheme provides a hierarchical way to classify enzymes according to their catalytic functions. While recent protein language model (PLM) based approaches like CLEAN and ProteInter have improved sequence-based EC number prediction, they struggle with fine-grained classification at the deepest hierarchical level. Structure-based approaches for grouping similar proteins using alignment tools excel at finding proteins that share overall global structure, but suffer from high false positive rates when classifying proteins that are globally structurally similar but functional differentiation depends on a localized region. This problem is particularly relevant to EC number prediction, as enzymatic function depends on its catalytic domain, which is a relatively small, specific region of the protein. We introduce Deep Enzyme Function Transfer (DEFT) that harmonizes sequence- and structure-based approaches through the key insight that PLM based annotations of the first two EC number hierarchy levels vastly reduce false positives that are likely to show in purely structure-based EC number prediction. Given an enzyme of interest, DEFT first uses a PLM based method to assign the first two levels of the enzyme’s EC number, and then uses a structure-based method to predict the remaining two levels of the EC number. Using benchmarking datasets, we demonstrate that DEFT achieves superior accuracy compared with current state-of-the-art tools for EC number prediction. Furthermore we show that DEFT’s computational efficiency enables high-throughput, genome-wide annotations of total enzyme repertoires in organisms. We illustrate this capability by experimentally validating DEFT predicted glycoside hydrolase (GH) profiles of intestinal mucus associated bacteria. Author summary Enzymes are ubiquitous proteins that catalyze chemical reactions of living cells. Enzymes are classified using a hierarchical numbering system called Enzyme Commission (EC) numbers that describe the chemical reactions the enzymes catalyze, from a general reaction type (e.g., breaking bonds, transferring chemical groups, etc.) to more specific aspects such as chemical bonds and substrates involved in the reaction. We present a new machine learning method for predicting EC numbers called Deep Enzyme Function Transfer (DEFT). This method improves on previous methods that use either protein sequence- or three-dimensional (3D) structure-based comparisons between enzymes of known and unknown classification. DEFT combines the strengths of both approaches by first using a protein sequence-based model to predict the general enzyme category and then using protein structure comparisons to predict the finer subcategories. We demonstrate that DEFT achieves superior accuracy compared with current state-of-the-art tools for EC number prediction. We next demonstrate how DEFT’s computational efficiency enables us to perform high-throughput, genome-wide annotations of organisms’ enzyme repertoires. We illustrate this capability by experimentally validating DEFT predicted sugar metabolizing enzyme profiles of intestinal mucus associated bacteria.
Discussion(0)
No comments yet. Be the first to comment.