Engineering of protein structures II

Introduction

The objective of this tutorial is to evaluate and design the effect of single-point mutations on protein stability, intermolecular interactions, or protein function. We will start querying for a known stabilising mutation on the endolysin.

Obtaining reference stability data

FireProtDBhttps://loschmidt.chemi.muni.cz/fireprotdb/
  • Search in FireProtDB for “Endolysin” results with both ΔTm and ΔΔG data annotated. Note the most destabilizing mutation (R96H), its average effect on melting temperature (-13.6 °C ΔTm), on the free Gibbs energy (-3.3 ΔΔG) and, after clicking in “Endolysin” on the “Protein” column, the reference Protein Data Bank (PDB) structure (PDB: 2LZM).

Mutational effect prediction on stability using mCSM

mCSMhttps://biosig.lab.uq.edu.au/mcsm

Obtain reference structure

Calculation of the stability effect of a single-point mutation

  • On the “Single mutation” panel, upload the file 2lzm.pdb using the button “Browse”. On the “Mutation box”, introduce the following: R96H. Introduce A as the chain in the “Mutation chain” box. Click on “Submit”. After waiting a short time, the results will show.
  • Does mCSM prediction agree with FireProtDB annotated data?

Calculation of the stability effect on a list of mutations

  • Let’s investigate if there is a mutation for that position which is predicted to have a stabilizing effect. To this end, we will use the second panel, “Mutation list”. Upload the file 2lzm.pdb using the button “Browse”, and the file R96_sat.txt in the “Mutation list file”. Click on “Submit” and wait for results.
  • Does mCSM predict any stabilizing mutation?

Calculation of the stability effect on saturating a single position

  • Note that in the previous exercise, we introduced a list to saturate the mutated position R96. This is, our list of mutations included all possible changes on that position to any other amino acid. mCSM allows us to perform this operation more efficiently. In the “Systematic” panel, upload the file 2lzm.pdb using the button “Browse”, indicate position R96 in the “Residue” box, and Introduce A as the chain in the “Mutation chain” box. Click on “Submit” and wait for results.
  • Are the results any different from the previous exercise?

Functional effect prediction using predictSNP.

PredictSNPhttps://loschmidt.chemi.muni.cz/predictsnp
  • Open in a new browser tab the home page for predictSNP: https://loschmidt.chemi.muni.cz/predictsnp, and click on the “Consensus classifier for prediction of the effect of amino acid substitutions” box.
  • In the “Input” box, paste the sequence of lysozyme available at the Protein Data Bank:
    >2LZM_1
    MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
  • Press the “Load” button. On the mutations panel, select position R96, and for mutations click on “ALL” button. Leave tools for evaluation as they are selected by default, optionally introduce a title to your job, and type in your email in the appropriate box. Click on the “Evaluate!” button and wait for the results or visit the pre-calculated ones:
  • Precalculated results: PredictSNP
  • From the functional point of view, which are the safest mutations on position R96 for lysozyme?
  • Do the results correspond to the ones previously calculated with mCSM or those reported by FireProtDB?

Engineering protein surfaces using FuncLib

FuncLibhttps://ablift.weizmann.ac.il
  • Step 1: Open in a new browser tab the home page for FuncLib: https://ablift.weizmann.ac.il. Type in your institutional mail address (@mail.muni.cz), agree with the terms of use, and click “Next”.
  • Step 2: Click on “No” to the question “Do you wish to upload your own PDB file?”, and click “Next”.
  • Step 3: is skipped by the system.
  • Step 4: Enter some name for the experiment (i.e. “Lysozyme_Funclib”), type the pdb code for the endolysin, 2lzm, in the “PDB entry” box, and type “A” as “Chains to include in the calculations”. Click on “Next”.
  • Step 5: We are going to diversify the sequence around our position of interest. Enter in the Amino acid positions to diversify 92A 93A 94A 95A 96A 97A 98A 99A 100A. Note that you are indicating a list of amino acid positions and their chain each time; different amino acids are separated by a whitespace. Check essential amino acids for the endolysin in UniProt (under the Function → Features section) and enter in the “Essential amino acids to fix” the list obtained from there: 11A 20A 32A 104A 117A 132A. Note that the residues are blank space-separated and all include the reference chain name. Do not enter anything neither in the “Ligands to keep during simulation” box nor in the “Ions to keep during simulations” box, and click “Next”.
  • Step 6: Accept all parameters set by default and click “Next”.
  • Step 7: Accept all parameters set by default and click “Submit”.

  • The server will send you results to the email address you introduced. You can check the pre-calculated results at the following address:
  • Precalculated results: FuncLib
  • Unzip the results. Check at the best_clustered_mutants.csv file using a text editor (or the more or less command or any other equivalent), and the top50_designs.pse file using PyMol (or any other 3D structure visualizer). Note that despite the space to diversify consisted of 9 amino acid positions, some of them might not be diversified at all.
  • Which residues does FuncLib suggest for position R96? Are these in agreement with the previous results obtained by other tools?
  • Engineering protein stability using ProteinMPNN

    ProteinMPNNhttps://huggingface.co/spaces/simonduerr/ProteinMPNN
    • Open in a new browser tab the home page for Protein MPNN HuggingFace implementation: https://huggingface.co/spaces/simonduerr/ProteinMPNN. On the “Input” tab, in the “Input structure box”, type the pdb code for the endolysin, 2LZM. On the Settings tab, leave “A” as “Designed chain”, increase the number of sequences to 10, leave the default Model “vanilla-v_48_020”, and leave the default figures for “Sampling temperature” (note the value of 0.1) and “Backbone noise”. To restrict the experiment around the region of our position of interest, R96, we will fix the amino acids in the ranges 1-91 and 101-164. To do so, enter in the “Fixed Positions” box the following: “resid 1 to 91 or resid 101 to 164”. Click on “Run”. Gather the results at the “Designed Sequences” and “T-adjusted probabilities” tabs.
    • How different are the designed sequences and where are the differences?
    • Which residues are suggested for position 96? Do these residues match the suggestions from other tools used previously?

    • Repeat the experiment using a Sampling temperature of 0.3
    • Are the resulting sequences more or less divergent according to the previous experiment?

    Engineering de novo proteins (or protein regions) using RoseTTAFold Diffussion (RFDifussion)

    RoseTTAFold Diffusionhttps://colab.research.google.com/drive/1jPrrkcJJv4HUl5eCpHy2VlyLvH2VABV5?usp=sharing
    • Open a new browser tab and go to the dedicated Google colab webpage: https://colab.research.google.com/drive/1jPrrkcJJv4HUl5eCpHy2VlyLvH2VABV5?usp=sharing. Log in using your credentials.
    • Step 1: Verify GPU usage. Click on “Runtime”, and check that the popup “Change runtime type” is set to T4 GPU.
    • Step 2: Install dependencies. Click on the “Play” button in “setup RFdifussion”. A security alert may pop up. Click on “Run anyway”.
    • Step 3: Obtain and upload input data. Here, we are going to re-design a protein with a structure predicted by AlphaFold. Download from AlphaFoldDB the 3D structure of Galathowenia oculata haemoglobin: https://alphafold.ebi.ac.uk/entry/A0A6M8AX40. Click on the folder icon in the very left-side pane of your colab screen. Click on the upload button (looks like a white page with an upside arrow in the middle).
    • Step 4: Specify the job name and the contigs. Fill in the name of your job (e.g. type “Hemoglobin_scaffolding” in the field “name” under “run RFdiffusion to generate a backbone”. Specify the “contigs” by which the user can specify which residues from the input structure should be used (if any) and which should be newly designed. In our case type “25/A26-206” into the field “contigs”. This means we are newly designing the first 25 residues of the protein and scaffolding it on top of the remaining residues from the template (residues 26-206 of chain A). For more details on the “contigs” refer to the README of RFdiffusion (https://github.com/RosettaCommons/RFdiffusion/tree/main) or to the “Instructions” at the end of the notebook.
    • Step 5: Navigate the tool to use our input data and run diffusion. Next, in the workspace panel, on the left of your screen, right-click the name of your input PDB structure, select “copy path”, paste the path into the “pdb” field of the form and run the cell by pressing the “Play” button.
    • Step 6: Download the results. Scroll to the end of the notebook and click on the “Play” button on the cell entitled “Package and download results”. In the workspace panel, on the left of your screen, right-click the newly appeared file “Hemoglobin_scaffolding.results.zip”, and click on “Download”. Extract the result file outputs/Hemoglobin_scaffolding_0.pdb from the archive file.
    • Step 7: Compare input and output. Open in PyMol the input structure and the output one.
    • How different are the sequences? And the structures? What was the effect of RFDifussion?