Experimentally Validated Deep Learning Control of Protein Aggregation
Authors
Cima, V., Kunka, A., Planas-Iglesias, J., Grakova, E., Havlasek, M., Subramanian, M., Beloch, M., Marek, M., Slaninova, K., Damborsky, J., Prokop, Z., Bednar, D., Martinovic, J.
Source
COMMUNICATIONS CHEMISTRY XX: xxx-xxx (2026)
Abstract
The identification of aggregation-prone regions in proteins and their suppression through mutations is a powerful strategy to enhance protein solubility and yield, significantly expanding their potential applications. Here, we developed and experimentally validated a deep neural network-based predictor, AggreProt, that generates a residue-level aggregation profile for protein sequences. The model outperformed or matched current state-of-the-art algorithms, as validated on two independent datasets comprising hexapeptides and full-length proteins with annotated aggregation-prone regions. Importantly, we validated the model experimentally using a set of 34 hexapeptides identified in the model protein haloalkane dehalogenase LinB, along with seven proteins from the AmyPro database. Experimental results agreed with our predictions in 79% of cases and revealed inaccuracies in some database annotations. Finally, the algorithm’s utility was demonstrated by identifying aggregation-prone regions in the LinB enzyme and designing mutations to suppress aggregation in its exposed regions. The resulting variants exhibited reduced aggregation propensity, improved solubility, and up to a 100% increase in yield compared to the wild type. AggreProt is freely available to the scientific community via a user-friendly web server: https://loschmidt.chemi.muni.cz/aggreprot.
