Step-by-step analysis

The typical CaverDock workflow consists of the following steps:

  1. calculation and selection of receptor tunnels with CAVER

  2. discretization of the tunnel(s) to a set of discs with CaverDock’s tool Discretizer

  3. preparation of input files for docking (by conversion of PDB to PDBQT format, optionally setting the side-chain flexibility)

  4. configuration and execution of CaverDock

  5. visualization and interpretation of results

In the following text, we describe the individual steps of the workflow. You may test the workflow mentioned in this section using the example input stored in the example folder packed with CaverDock. Any step may be omitted as the folder also contains intermediate and final results.

Note

All executables have built-in help accessible via --help command-line option.

Important

If you are using Apptainer container, please keep all files in one directory (or its subdirectories) or you may face problems with inaccessible paths.

Tunnel calculation

The creation and selection of tunnels can be done locally by CAVER software [3] or using the web portal Caver Web [5]. For more details on CAVER usage, see CAVER User Guide. You can find the already exported tunnels in the example packed with the CaverDock tool.

Selecting the correct CAVER file with tunnel

When calculating tunnels in a single protein structure with CAVER, the user can copy the relevant tunnel file from the clusters_timeless directory in CAVER output eg. out/data/clusters_timeless/tun_cl_00X_Y.pdb. In this case, the file for each tunnel cluster contains only one tunnel. In the scenario where the user calculated the tunnels in multiple snapshots, the user should check the tunnel_characteristics.csv file in the out/analysis folder and find the corresponding tunnel number (and protein snapshot for the preparation of the receptor file) for the studied tunnel cluster. In this case, the user must parse the relevant cluster file containing the structures of tunnels from the clusters directory in CAVER output eg. out/data/clusters/tun_cl_00X_Y.pdb. Find the part with the tunnel of interest, extract it and renumber the dummy atom numbers so they would start from 1. Then the tunnel pdb file will be ready for discretization.

Tunnel discretization

Having one or more tunnels exported, they must be discretized to a set of cuts (discs evenly cutting the tunnel). The discretizer tool requires a tunnel in PDB format as the input and produces a discretized tunnel for CaverDock. Please note that CAVER considers the active site as the tunnel’s beginning and the surface of the receptor as the tunnel’s end. This direction of the tunnel is also adopted in CaverDock. See the figure for an illustration of discretizer output.

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif discretizer -f tunnel.pdb -o tunnel.dsd
../_images/discretizer.png

Tunnel extension

To get the molecule further from the protein during unbinding or start from a larger distance during binding we recommend extending the tunnel by a few Angstroms.

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif cd-extendtunnel -f tunnel.dsd -d 5 > tunnel-extended.dsd

Direction of CaverDock simulation

By default, the direction of the simulation is out of the proteins because it follows the order of the discs representing the tunnel. To change the direction for the molecule to bind into the protein the user must reverse the order of discs.

$ tac tunnel-extended.dsd > tunnel-reversed.dsd

Ligand & receptor preparation

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif prepare_receptor4 -r receptor.pdb -o receptor.pdbqt
$ apptainer exec /path/to/caverdock.sif prepare_ligand4 -l ligand.pdb -o ligand.pdbqt

In the case of Apptainer container and the prebuilt package, you can use all the additional parameters for these two scripts as you would with standalone version from MGLtools. You can list them via --help command-line option.

CaverDock execution

Finally, CaverDock must be configured and executed. The minimalistic configuration must contain a specification of the search grid box and names of the receptor, the ligand and the tunnel file. Please note that the search box must contain the whole ligand’s trajectory (i.e. the whole tunnel). Although the configuration file can be prepared manually or with MGLTools, CaverDock package contains a script which assembles the configuration file with basic settings automatically.

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif cd-prepareconf -r receptor.pdbqt -l ligand.pdbqt -t tunnel.dsd > caverdock.conf

You can include any of the CaverDock command-line options as a parameter in the configuration. For example, saving the logs from the computation to files can be turned on by calling CaverDock with the command-line option --log logName or by adding a line log=logName into the configuration file. In the case of the API, the extra parameters must be used during the launch of the calculation.

With the prepared configuration file, CaverDock may be executed. CaverDock uses MPI for parallel execution. For tunnel analysis, it must be executed in at least two processes (by setting mpirun parameter -np to 2 or more).

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif mpirun -np 8 caverdock --config caverdock.conf --out outName --log logName

CaverDock can be run on a personal notebook or a desktop computer; its execution commonly takes from minutes to dozens of minutes. For large-scale execution or highly complex receptors and ligands, it might be more convenient to run it at a computational cluster.

CaverDock outputs

CaverDock generates several output files:

Note

outName is a placeholder which can be set with parameter --out outName

outName-lb.pdbqt

the lower-bound trajectory

outName-ub.pdbqt

the upper-bound trajectory

outName-ub-alternatives.pdbqt

alternative upper-bound trajectories found by the heuristic

outName-min.pdbqt

the upper-bound trajectory containing the snapshot with the lowest energy

outName-failed.pdbqt

a partial lower-bound or upper-bound trajectory, if CaverDock failed to find the lower-bound or upper-bound trajectory

outName-lb.txt

a check-point file with the lower-bound trajectory that can be loaded in future executions with --load_lb outName-lb.txt

outName-ranges.dat

a file with ranges of energies for all discs

bottlenecksName.noOfDisc

a file with information about bottlenecks for each disc if CaverDock was called with option --dump_bottlenecks bottlenecksName

logName.noOfProcess

logs with detailed information about CaverDock run for each process if CaverDock was called with option --log logName

Results visualization

Files outName-lb.pdbqt and outName-ub.pdbqt are created upon successful processing of the tunnel by CaverDock. The first file contains a trajectory estimating the lower-bound of the energy profile. The lower-bound trajectory is not contiguous; however, it samples the tunnel without any gap in the ligand movement. The second trajectory is contiguous and estimates the upper-bound of the energy profile (it is the best trajectory found; however, a trajectory with a better energy profile may exist). You can explore the geometry of trajectories by standard tools for viewing PDBQT files (PMV and some versions of PyMOL), and you can also create graphs of the transport energy.

Energies of all snapshots as well as their positions in a tunnel are stored in PDBQT files (lines beginning with REMARK CAVERDOCK). One can easily extract the data to text with our provided script.

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif cd-energyprofile -d tunnel.dsd  -t outName-ub.pdbqt -s 0 > energy.dat

The energies should be extracted from PDBQT file containing an upper-bound trajectory (it contains the geometry of the upper-bound trajectory and both upper and lower-bound energies). Optional: If you simulated the trajectory in the IN direction, you can use the flag -r IN to calculate the distance between the discs in the IN order. This will not reverse the output profile.

If you simulated the trajectory in the IN direction and you wish to reverse the output energy profile to have the binding site on the left side in plots, use the script cd-reverseeprofile to reverse it:

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif cd-reverseeprofile energy.dat -o energy-reverse.dat

After the extraction, one can easily create graphs with any tool, such as Gnuplot.

$ cd /path/to/input_data
$ apptainer exec /path/to/caverdock.sif gnuplot -e "set terminal pdf; set output \"energy.pdf\"; set xlabel \"distance\"; set ylabel \"energy\"; plot \"energy.dat\" u 1:4 w l t \"upper-bound\", \"\" u 1:6 w l t \"lower-bound\"" -p
../_images/energies.png

You can learn more about the interpretation and evaluation of results in the section Results interpretation.