FireProtDB logo
v2.0
Mutational data for protein stability
Loschmidt Laboratories
Search

Acknowledgements

Data sources

ProTherm

Megascale

  • Megascale dataset comprises over 700,000 high-quality experimental measurements of protein folding stability, systematically covering single and selected double amino acid variants across hundreds of natural and de novo designed protein domains. It was generated using a high-throughput cDNA display proteolysis method.
  • Link: https://zenodo.org/records/7992926
  • Reference: https://pubmed.ncbi.nlm.nih.gov/37468638/

Domainome dataset

  • Large-scale experimental analysis of Human Domainome 1, a library containing more than 500,000 missense mutation variants across more than 500 human protein domains. The dataset comprises two parts. The first dataset contains a predicted ddG values averaged over a set of homologue proteins, while the second shows a general fitness of the protein including its mutations.
  • Link: https://zenodo.org/records/14356805
  • Reference: https://pubmed.ncbi.nlm.nih.gov/39779847/

VariBench

  • VariBench contains datasets of experimentally verified high-quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three-dimensional structure), along with identifier mapping to relevant databases.
  • Link: https://structure.bmc.lu.se/VariBench/
  • Reference: https://pubmed.ncbi.nlm.nih.gov/22903802/

UniProt

  • A comprehensive, expertly curated database providing a central repository for protein sequences and functional information, including details on protein function, classification, and cross-references to numerous other biological resources. It integrates data from various sources to offer the most complete and richly annotated catalog of proteins.
  • Link: https://www.uniprot.org/
  • Reference: https://pubmed.ncbi.nlm.nih.gov/36408920/

IntePro

  • An integrated database that classifies protein sequences into families and predicts the presence of functional domains and important sites by combining predictive models (signatures) from multiple specialized member databases. It provides a unified view of protein classification, enhancing the functional characterization of new protein sequences.
  • Link: https://www.ebi.ac.uk/interpro/
  • Reference: https://pubmed.ncbi.nlm.nih.gov/36350672/

PDB database

  • PDB is the global archive for experimentally determined three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. It provides atomic coordinates, experimental details (e.g., from X-ray crystallography, NMR, cryo-EM), and associated information, serving as a fundamental resource for understanding molecular function and design.
  • Link: https://www.rcsb.org/
  • Reference: https://pubmed.ncbi.nlm.nih.gov/10592235/

COZYME

  • European network focused on advancing protein engineering through the application of machine learning. Its primary goal is to foster collaboration among researchers across Europe to develop and apply cutting-edge approaches for designing and optimizing proteins with desired properties. It acts as a collaborative platform for sharing knowledge, developing best practices, and organizing scientific events (workshops, conferences) related to machine learning in protein engineering.
  • Link: https://cozyme.eu/

Funding

The authors thank the RECETOX, e-INFRA, and ELIXIR CZ Research Infrastructures (nos. LM2023069, 90254, and LM2023055), financed by the Czech Ministry of Education, Youth and Sports, Grant Agency of the Czech Republic (no. 25-18233M), and the Technology Agency of the Czech Republic (TACR NPO TEREP TN02000122/001N). This work was also supported by the European Union’s Horizon 2020 research and innovation programme (no. 857560 TEAMING), and by the European Union Centre of Excellence CLARA (no. 101136607), COST Action COZYME (no. CA21162), and Brno University of Technology (no. FIT-S-23-8209). This publication reflects only the author’s view, and the European Commission is not responsible for any use that may be made of the information it contains. Funding to pay the Open Access publication charges for this article was provided by European Union’s Horizon 2020 research and innovation programme (no. 857560 TEAMING).