Virtual screening
The purpose of this page is to show users how to set up complete virtual screening with pyCaverDock without the need to understand python code and without the use of custom pipelines controlling the screening through API classes and methods. Although, the screening possibilities of the provided cd-screening script were shown in the Quick start page, we will show you how to prepare the yaml file for screening and describe all the options that the file can contain.
Please read through the text below but also look at a sample yaml file virtual_screening.yaml (available in the archive below) with detailed options. Please pay attention to the file structure and line indentations for different parameters. The parameters used in the yaml files are directly related to classes and commands from the API. The user can learn more about each of them in Step-by-step analysis and Command line options pages in the documentation.
Example Virtual Screening data: virtual_screening.tar.gz
Receptor and tunnel pairs
At the beginning of the file, we start with the definition of the receptor and the tunnel(s) which belong to it.
receptor_1:
protein: proteins/01/1K63-a.pdb
tunnels:
- path: proteins/01/tun_cl_001_1.pdb
discretization_delta: 0.3
extension:
distance: 2
step: 0.2
disc_ranges:
type: DISC_NUMBER
bound: [1, 50]
max: [10, 99]
surface: [98, 99]
- proteins/01/tun_cl_003_1.pdb
Going line by line receptor_1 is an arbitrary name for the set of receptor-tunnel(s) pair(s). The rest of the parameter names are given and must not be changed! The user must specify the path to the receptor pdb file at the protein line. The tunnel pdb files are introduced with tunnels. If the user wants to analyze more tunnels for the same protein, they can do it by adding the paths for each of them. Each tunnel file can be added simply by - or if the user wants to use other available advanced parameters for that tunnel the file must be referenced by - path. For the latter case, the other parameters are: The discretization parameter discretization_delta which sets the size of the discs. With extension you can set the parameters for the extension of the tunnel. With distance specify the length of the extension and with step the size of the extended discs. The disc_ranges parameter is used to set up parts of the related energy profiles which will be used to extract important energies. The analysis of the ligand passage energetics is done at the end of the CaverDock calculation. The parameter type specifies the format of used values (DISC_NUMBER or FRACTION). With the parameters bound, max and surface the user specifies the parts of the profile. You can find more information about the energy analysis in Results interpretation. The receptor-tunnel pairs are generated automatically based on the definition in the yaml file at the beginning of the screening. We remind the users to be careful when setting up the yaml file, especially with how the parameters are introduced, the name of the parameters and the correct levels of indentation.
Ligands
ligands_3:
- path: ligands/m004.pdb
drag_atom: 1
- ligands/m006.pdb
- ligands/m018.pdb
Similarly to the setup of receptors, the first line specifies the reference name for the set of ligands. The user must specify the path to the ligand pdb files either by - or by - path if the user wants to use other available advanced parameters. For the later case the only extra parameter is the drag_atom which selects the atom in the ligand file which will be used to drag the molecule through the studied tunnel (default 0 for the atom closest to the centre of the molecule).
Screening
screenings:
- name: Screening 1
dir: screening_1
direction: IN
trajectory_type: LOWERBOUND
seed: 2
plot:
share_axes: no
active_site_location: right
lowerbound_color: blue
upperbound_color: green
radius_color: yellow
show_zero_energy_line: yes
zero_energy_line_color: purple
plots_per_row: 3
receptors:
- $ref: receptor_1
- $ref: receptor_2
ligands:
$+:
- $ref: ligands_1
- $ref: ligands_2
After the receptor-tunnel pairs and ligand sets, we can set up the screening itself. The analysis of different receptors and ligands can be split into separate screenings. The user can also run two screenings with same input data but with different settings. It all depends on the users’ decisions. If we look at the setup of the screening parameters we have the name for the screening batch, dir for the name of the directory where the data will be saved. The next three parameters specify settings for CaverDock. The direction impacts how the ligand passage will be simulated - binding (IN) or unbinding (OUT). The type of CaverDock trajectory is set with trajectory_type to quick discontinuous LOWERBOUND or continuous UPPERBOUND. The difference between the types is discussed in other parts of the documentation and the CaverDock publications. Then we have the plot settings the for visual apearance of the energy plots generated at the end of the screening. The parameter share_axes specifies whether the plots for each experiment will use the same scale for the plot axes in all of the plots. The active_site_location is used for selecting how the plots will be drawn, and on which side the active site (tunnel start) will be depending on users’ preference. Parameters lowerbound_color, upperbound_color and radius_color set the colours for lines in the plot. If show_zero_energy_line is used it will impact the visualisation of the zero line for the energies on the y axis, this is followed by the setting for its colour zero_energy_line_color. By changing the plots_per_row parameter, the user selects how many plots will be drawn next to each other on a single line in the final figure. Then we have the most important parameters for the screening receptors and ligands. These govern which sets of receptor-ligand pairs and sets of ligands will be analyzed in the screening. We remind users to use the exact format as shown above. All receptor-tunnel-ligand combinations will be generated for each defined screening at the start of the screening process. The parameters for name, receptors and ligands are mandatory. The rest of the parameters for screening are optional. If the user does not specify them, the screening will run with default values in these parameters.
Once you have everything ready, you can run the screening using cd-screening script:
$ apptainer exec /path/to/caverdock.sif cd-screening -p 4 virtual_screening.yaml