Protein structure prediction

Prediction of secondary structure and residue burial/surface accessibility

SABLEhttp://sable.cchmc.org/ TOPCONShttp://topcons.cbr.su.se/

Using the tool SABLE, predict the secondary structure of the protein DrbA:

>DrbA MSCRLSSNRRGSSKLAAMTNLASDLFPHPSSELSIDGHTLRYIDTAASSDIPSSAVGSSDGEPTFLCVHGNPTWSFYYRRIIERYGKQQRVIAVDHIGCGRSDKPSEDEFPYTMAAHRDNLIRLVDELDLKNVILIAHDWGGAIGLSAMHARRDRLAGIGLLNTAAFPPPYMPQRIAACRMPVLGTPAVRGLNLFARAAVTMAMSRTKMKPDVAAGLLAPYDNWKNRVAIDRFVRDIPLNDSHPTMKTLRQLESDLPDLASLPISLIWGMKDWCFRPECLRRFQSVWPDAEVTELATTGHYVIEDSPEETLAAIDSLLARVKERIGAA
- precomputed results: SABLE
how many α-helices and how many β-sheets did the tools predict?
what is the prediction of the following amino acid residues: G71, V92, and E279?
which of the abovementioned residues was predicted to be the most exposed (highest relative solvent accessibility?)
Using the tool TOPCONS, assess, whether we are dealing with a transmembrane protein.
- precomputed results: TOPCONS

Homology modeling

RCSB PDBhttps://www.rcsb.org SWISS-MODELhttps://swissmodel.expasy.org/ Q-MEANhttps://swissmodel.expasy.org/qmean/

Voluntary:

ModWeb server, a web interface of the MODELLERhttps://modbase.compbio.ucsf.edu/modweb/

Target protein haloalkane dehalogenase from Tepidicaulis marinus (UniProtKB accession: A0A081BEP5), sequence:

>A0A081BEP5 MKVLRTPDACFEGLEDYPFTPHYHEFKDADGTPLRLHYVEEGPKDAAPVLLMHGEPSWSFLYRHMVRGLAEKGHRVLAPDLIGFGKSDKPAEQEDYTFERHVAWMSDWLTGLDLKNITLFCQDWGGLIGLRLVAAFPERFARVVVANTGLPIGTGWSEAFKQWLDFSQSTPVLPVADIVNGGSVRDFSEADKKAYDAPFPDESFKAGARRFPALVPITPEHPSVEENKAAWKVLEAFEKPFLTAFSDQDPVTKGGDKIFQERVPGAKGQPHVTIEGGGHFLQEDKPAELVELIDGFIKRTA
Prepare a homology model using the automatic server SWISS-MODEL, which uses the ProMod3 algorithm. SWISS-MODEL
- precomputed results: SWISS-MODEL
study the obtained model and guess how good quality you can expect of it. Focus on sequence identity, extent of the modeled structure, and quality of the structure of the used template.
evaluate the chosen model structure as well as the template structure using the server QMEAN
- precomputed results QMEAN: homology model, AlphaFoldDB model, and template (experimental structure)
compare the evaluated model and template using PyMOL
- display the QMEAN score using the command: spectrum b, red_blue, minimum=0, maximum=1

voluntarily, you can also prepare a homology model using the ModWeb server. You will find the license key in the study materials of Bi9410cen.

Structure prediction using AlphaFold3

AlphaFold Databasehttps://alphafold.ebi.ac.uk/ Alphafold3 serverhttps://alphafoldserver.com/ ColabFoldhttps://github.com/sokrypton/ColabFold Q-MEANhttps://swissmodel.expasy.org/qmean/

visit the AlphaFoldDB database and look at the entry for the Free fatty acid receptor 2 protein (UniProtKB accession: O15552)
- inpect the structure in Structure viewer. Which parts of the protein are predicted with the lowest confidence?
- Inspect the Predicted Aligned Error. What is the reason that residues 150-160 are colored white? What can we say about the confidence of the prediction of relative orientation of the two segments of the protein which loop around residues 150-160 is joining?

Target protein haloalkane dehalogenase from Tepidicaulis marinus (UniProtKB accession: A0A081BEP5), sequence:

>A0A081BEP5 MKVLRTPDACFEGLEDYPFTPHYHEFKDADGTPLRLHYVEEGPKDAAPVLLMHGEPSWSFLYRHMVRGLAEKGHRVLAPDLIGFGKSDKPAEQEDYTFERHVAWMSDWLTGLDLKNITLFCQDWGGLIGLRLVAAFPERFARVVVANTGLPIGTGWSEAFKQWLDFSQSTPVLPVADIVNGGSVRDFSEADKKAYDAPFPDESFKAGARRFPALVPITPEHPSVEENKAAWKVLEAFEKPFLTAFSDQDPVTKGGDKIFQERVPGAKGQPHVTIEGGGHFLQEDKPAELVELIDGFIKRTA

predict the structure of the target protein using Alphafold3 server
- click on the "Continue with Google" button, log in using your Google account, fill in the institution details "Faculty of Science, Masaryk University" and agree to the terms and conditions
- In the input form, select Molecule type: Protein, Copies: 1, and paste the sequence of the target protein
- Click on "Continue and preview job". The calculation will take a few minutes.
  - Precomputed result
Assess the quality of the model using the QMEAN server
Compare the model with the structures from the previous task using PyMOL
- Display the QMEAN score using the command: spectrum b, red_blue, minimum=0, maximum=1
- precomputed result: QMEAN: AlphaFold3
Model quality assessment
RCSB PDBhttps://www.rcsb.org Q-MEANhttps://swissmodel.expasy.org/qmean/ EMBOSS Needlehttps://www.ebi.ac.uk/Tools/psa/emboss_needle/
- The target protein is the haloalkane dehalogenase from Caulobacter crescentus (UniProtKB accession: B8H3S9), sequence:
  
  >B8H3S9
  
  MDVLRTPDERFEGLADWSFAPHYTEVTDADGTALRIHHVDEGPKDQRPILLMHGEPSWAYLYRKVIAELVAKGHRVVAPDLVGFGRSDKPAKRTDYTYERHVAWMSAWLEQNDLKDIVLFCQDWGGLIGLRLVAAFPERFSAVVVSNTGLPIGVGKSEGFEAWLNFSQNTPELPVGFILNGGTARDLSDAERSAYDAPFPDESYKEGARIFPALVPITPEHASVEENKAAWAVLETFDKPFVTAFSDADPITRGGEAMFLARVPGTKNVAHTTLKGGHFVQEDSPVEIAALLDGLVAGLPQA
- download the models constructed using templates with different similarity to the target protein: 1B6G, chain A and 2HDW, chain A.
  - Model 1B6G and Model 2HDW
- Study the sequence alignment of the target protein with these templates and guess, how good will be the quality/precision of the resulting homology models. Focus primarily on the number of insertions and deletions. You can calculate the sequence alignment using various tools, for example EMBOSS Needle.
- download the model from AlphaFoldDB and assess all three models using QMEAN
  - precomputed results QMEAN: model according to template 1B6G, 2HDW, and AlphaFold3
- Compare the models using PyMOL
  - Show the QMEAN scores on the structures using the command: spectrum b, red_blue, minimum=0, maximum=1
- Do you find the differences between the models significant?
- Which model would you trust the most?
Previous chapter

Next chapter

Structural Biology – practice

Loschmidt Laboratories of Protein Engineering, Faculty of Science, Masaryk University

Protein structure prediction

Prediction of secondary structure and residue burial/surface accessibility

Homology modeling

Structure prediction using AlphaFold3

Model quality assessment

>B8H3S9