Enzymes play a key role in cellular metabolic processes. To enable the quantitative assessment of these processes, researchers need to know the so-called “turnover number” (for short: kcat) of the enzymes. In the scientific journal Nature Communications, a team of bioinformaticians from Heinrich Heine University Düsseldorf (HHU) now describes a tool for predicting this parameter for various enzymes using AI methods.
Enzymes are important biocatalysts in all living cells. They are normally large proteins, which bind smaller molecules — so-called substrates — and then convert them into other molecules, the “products.” Without enzymes, the reaction that converts the substrates into the products could not take place, or could only do so at a very low rate. Most organisms possess thousands of different enzymes. Enzymes have many applications in a wide range of biotechnological processes and in everyday life — from the proving of bread dough to detergents.
The maximum speed at which a specific enzyme can convert its substrates into products is determined by the so-called turnover number kcat. It is an important parameter for quantitative research on enzyme activities and plays a key role in understanding cellular metabolism.
However, it is time-consuming and expensive to determine kcat turnover numbers in experiments, which is why they are not known for the vast majority of reactions. The Computational Cell Biology research group at HHU headed by Professor Dr Martin Lercher has now developed a new tool called TurNuP to predict the kcat turnover numbers of enzymes using AI methods.
To train a kcat prediction model, information about the enzymes and catalysed reactions was converted into numerical vectors using deep learning models. These numerical vectors served as the input for a machine learning model — a so-called gradient boosting model — which predicts the kcat turnover numbers.
Lead author Alexander Kroll: “TurNuP outperforms previous models and can even be used successfully for enzymes that have only a low similarity to those in the training dataset.” Previous models have not been able to make any meaningful predictions unless at least 40% of the enzyme sequence is identical to at least one enzyme in the training set. By contrast, TurNuP can already make meaningful predictions for enzymes with a maximum sequence identity of 0 — 40%.
Professor Lercher adds: “In our study, we show that the predictions made by TurNuP can be used to predict the concentrations of enzymes in living cells much more accurately than has been the case to date.”
In order to make the prediction model easily accessible to as many users as possible, the HHU team has developed a user-friendly web server, which other researchers can use to predict the kcat turnover numbers of enzymes.
Link to the web server: https://turnup.cs.hhu.de/
Background: Machine learning and deep learning
Deep learning models comprise multi-layered artificial neural networks which can recognise and process patterns in the input data. Using large training datasets is the optimum way to train a deep learning model to process numerical inputs.
Gradient boosting models are a machine learning method, which produces large numbers of decision trees. The results of all decision trees for a specific input are used to make predictions. Similar to deep learning, training data are used to refine the model, i.e. to produce the decision trees.