Mobilizing the Biocatalysis Community for Reproducible and Reusable Data Collection
Authors
Marques, S. M., Planas-Iglesias, J., Velecky, J., Musil, M., Asano, Y., Borowski, T., Brissos, V., Cespugli, M., Chorozian, K., Dadashipour, M., Erdem, E., Ferrandi, E. E., Grigorakis, K., Kluza, A., Lawniczek, J., Makryniotis, K., Monti, D., Nestl, B., Ngo, A. C., Nikolaivits, E., Patti, S., Pentari, C., Rodrigues, C. F., Schopper, T., Seweryn-Ozog, K., Szaleniec, M., Taborda, A., Tataruch, M., Tischler, D., Topakas, E., Wang, J., Wojcik, P., Wojtkiewicz, A. M., Woodley, J. M., Zastawny, O., Martins, L. O., Fraaije, M., Pleiss, J., Schnell, S., Damborsky, J., Mazurenko, S., Bednar, D.
Source
ACS CATALYSIS 16: 8858–8868 (2026)
Abstract
Science is an ever-evolving endeavor, with all new research grounded in knowledge gained in previous studies and publications. This applies not only at the level of theory and fundamental knowledge, but also at the level of specific data. In the context of enzyme research, that includes information on properties such as protein production and folding, protein solubility, stability, catalytic activity, together with specificity and stereoselectivity, as well as regulatory effects as activation and inhibition, and kinetics, which are crucial for multiple practical reasons. In the fields of biology and biochemistry, the availability of high-quality experimental data has already contributed to several breakthroughs over time. One example is AlphaFold 2, (1) released in 2021, a machine learning-based tool that predicts the 3D structures of proteins with unprecedented accuracy. Its release represented a major breakthrough in structural biology, addressing a long-standing challenge that had persisted for decades. A key element in the success of AlphaFold was the large number of experimental protein structures available in the Protein Data Bank (ca. 159,000 in 2019). (2) This was made possible because the deposition of crystallographic, nuclear magnetic resonance (NMR), and electron microscopy (cryo-EM) structures in a uniform format into databases became the gold standard and a strict requirement for their publication three decades before the AlphaFold release. (3,4) Thanks to the high quality and the large volume of its data, the Protein Data Bank also enabled the development of molecular docking and other tools. Other examples are UniProt (5) and BRENDA, (6) databases that contributed to functional prediction tools, (7−9) metabolic modeling, (10−12) and large-scale enzyme design efforts. (13−16) Their success relies heavily on community contributions, data quality checks, and manual curation.
