Python API examples

Python API single calculation

On this page, we will show how to use the Python API to set up a single CaverDock calculation. The user needs to prepare the input files beforehand - protein structure, ligand molecule and tunnel. Bellow, we show the individual parts of the script.

First, the user must import all the needed modules.

import logging
import os
import pandas as pd

from pycaverdock import ActiveSiteLocation, CaverDock, cfggen, Direction, DiscRanges, EnergyProfilePlot, Ligand, MGLToolsWrapper, Receptor, load_tunnel, discretize_tunnel, TrajectoryType, plot_results, CaverDockTrajectory, generate_caverdock_config, BoundaryType
from pycaverdock.experiment import Experiment, Workdir, convert_eprofile_analysis
from pycaverdock.utils import dictval, get_basename

Set the work directory and location of CaverDock binary and MGLTools. The location of the programs will be different for the Apptainer container and local binaries.

WORKDIR_PATH = "work-single"
MGLTOOLS_HOME_PATH = None
CAVERDOCK_BINARY_PATH = None
DEBUG = False

logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO, format="[%(asctime)s] %(message)s")
log = logging.getLogger("cd-single-calculation")

mgltools = MGLToolsWrapper("/home/user/Documents/MGLtools/MGLTools-1.5.7rc1/")
caverdock = CaverDock("/home/user/Documents/CaverDock/local_bin/caverdock")

Set important parameters and the location of input files.

batch_name = 'Single_calculation'

log.info(f"Running batch {batch_name}")

direction = Direction.OUT
trajectory_type = TrajectoryType.LOWERBOUND

analysis_data = pd.DataFrame()
plots = []
experiment_log = []

receptor_path = "1K63-a.pdb"
ligand_path = "m004.pdb"
tunnel_path = "tun_cl_003_1.pdb"
tunnel_object = load_tunnel(tunnel_path)

title = f"r{get_basename(receptor_path)}-l{get_basename(ligand_path)}-t{get_basename(tunnel_path)}-d{direction.name.lower()}-{trajectory_type.name.lower()}"

log.info(f"Running {title}")
log.info(f"Experiment settings: receptor={receptor_path}, tunnel={tunnel_path}, ligand={ligand_path}, direction={direction.name}, trajectory_type={trajectory_type.name}, seed={seed or '<default>'}, exhaustiveness={exhaustiveness or '<default>'}")

Prepare the receptor and ligand pdbqt files, discretize the tunnel. You can extend the tunnel if needed.

receptor = Receptor(receptor_path) if receptor_path.endswith(".pdbqt") else mgltools.prepare_receptor(receptor=receptor_path, output_path=get_basename(receptor_path)+".pdbqt")
print(receptor)

ligand = Ligand(ligand_path) if ligand_path.endswith(".pdbqt") else mgltools.prepare_ligand(ligand=ligand_path, output_path=get_basename(ligand_path)+".pdbqt")

discretized_tunnel = discretize_tunnel(tunnel=tunnel_object, delta=0.3)
#print(discretized_tunnel)

extended_tunnel = discretized_tunnel
ext_distance = 0.5
if ext_distance > 0:
    extended_tunnel = discretized_tunnel + discretized_tunnel.extend(
        distance=ext_distance, step=0.2)

Run CaverDock.

trajectory = caverdock.run(
    directory=".",
    name="cd-single",
    ligand=ligand.path,
    receptor=receptor.path,
    tunnel=extended_tunnel,
    direction=direction,
    trajectory_type=trajectory_type,
    mpi_processes=4,
    seed=2,
    catomnum=0,
    exhaustiveness=1
)

Create the energy profile file from the pdbqt file, generate the simplified report and draw plots.

eprofile = trajectory.energy_profile
eprofile.write_to_dat("profile.dat")

disc_ranges = DiscRanges(0,0.5,0.1,1,0.99,1,BoundaryType.FRACTION)

analysis = eprofile.analyse(trajectory_type=trajectory_type, disc_ranges=disc_ranges)

af = convert_eprofile_analysis(title, analysis, receptor_path, tunnel_path, ligand_path)
af.to_csv("analysis.csv", index=False)

analysis_data = pd.concat((analysis_data, af))
plots.append(EnergyProfilePlot(title, eprofile, trajectory_type))

analysis_data.to_csv(os.path.join("results.csv"), index=False)
log.info("Plotting results")

plot_results(plots, output=os.path.join("out.png"), share_axes=True)

Complete script for download: api-example-single.py

Example case data: test-single.tar.gz

Python API batch calculation

Here we will show how to use the Python API to set up batch calculations with CaverDock. The user needs to prepare the input files beforehand - protein structure, ligand molecule and tunnel. Furthermore, the user must prepare a yaml file which references all the input files to create all the possible combinations for the batch calculation based on the definition of the batch calculation. The workflow is similar to the calculations run with cd-screening script from the package with the difference, the parameters can be set either in the yaml file or in the custom script for the batch calculation. If the additional parameters are set in the yaml file and in the custom script the values for the parameters will be overridden by the settings in the custom script. In comparison with the single calculation, we will use a higher level API class experiment which loads the yaml file, runs all the calculations in a loop and takes care of most of the needed settings automatically. Be careful of correct indentation in the script. In comparison with screening the receptor tunnel pairs must be defined manually in the yaml file. The $product is related to the product of the combinations created with the cfggen library, not to any biological product.

receptors_1:
  - proteins/01/1K63-a.pdb
tunnels_1:
  - path: proteins/01/tun_cl_001_1.pdb
    discretization_delta: 0.3
    extension:
      distance: 2
      step: 0.2
  - proteins/01/tun_cl_003_1.pdb

ligands_1:
  - path: ligands/m004.pdb
    drag_atom: 1
  - ligands/m006.pdb
  - ligands/m018.pdb

receptor_pairs:
  $+:
    - $product:
        receptor:
          $ref: receptors_1
        tunnel:
          $ref: tunnels_1

batches:
  - name: Batch example
    inputs:
      $product:
        receptor_pair:
          $ref: receptor_pairs
        ligand:
          $ref: ligands_1

Example yaml files: example_batch_experiment.yaml, batches.yaml

Bellow, we show the individual parts of the script.

First, the user must import all the needed modules.

import logging
import os
import pandas as pd

from pycaverdock import ActiveSiteLocation, BoundaryType, CaverDock, Direction, DiscRanges, EnergyProfilePlot, Ligand, MGLToolsWrapper, Receptor, TrajectoryType, cfggen, plot_results
from pycaverdock.experiment import Experiment, Workdir, convert_eprofile_analysis
from pycaverdock.utils import dictval, get_basename

Set the work directory and location of CaverDock binary and MGLTools.

WORKDIR_PATH = "work"
EXPERIMENT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "batches.yaml")
MGLTOOLS_HOME_PATH = None
CAVERDOCK_BINARY_PATH = None
DEBUG = False

logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO, format="[%(asctime)s] %(message)s")
log = logging.getLogger("cd-batch")

mgltools = MGLToolsWrapper(MGLTOOLS_HOME_PATH)
caverdock = CaverDock(CAVERDOCK_BINARY_PATH)

Set the location of the yaml file and define the experiments.

batches = cfggen.build_template_from_file(EXPERIMENT_FILE)["batches"]

for batch in batches:
    batch_name = batch['name']

    log.info(f"Running batch {batch_name}")

    workdir = Workdir(os.path.join(WORKDIR_PATH, dictval(batch, "dir", batch_name.replace(" ", "_"))))
    direction = dictval(batch, "direction", Direction.OUT, lambda x: Direction[x.upper()])
    trajectory_type = dictval(batch, "trajectory_type", TrajectoryType.LOWERBOUND, lambda x: TrajectoryType[x.upper()])
    exhaustiveness = dictval(batch, "exhaustiveness", fn=int)
    seed = dictval(batch, "seed", fn=int)

    analysis_data = pd.DataFrame()
    plots = []
    experiment_log = []

    batch_inputs = batch["inputs"]
    total_experiment_num = len(batch_inputs)
    current_experiment_num = 1

    for (index, inputs) in enumerate(batch["inputs"]):
        receptor_path = inputs["receptor_pair"]["receptor"]
        ligand_def = inputs["ligand"]
        ligand_path = ligand_def if type(ligand_def) == str else ligand_def["path"]
        tunnel_def = inputs["receptor_pair"]["tunnel"]
        tunnel_path = tunnel_def if type(tunnel_def) == str else tunnel_def["path"]

        experiment_name = f"r{get_basename(receptor_path)}-l{get_basename(ligand_path)}-t{get_basename(tunnel_path)}-d{direction.name.lower()}-{trajectory_type.name.lower()}"

        log.info(f"Running experiment {experiment_name} [{current_experiment_num}/{total_experiment_num}]")
        log.info(f"Experiment settings: receptor={receptor_path}, tunnel={tunnel_path}, ligand={ligand_path}, direction={direction.name}, trajectory_type={trajectory_type.name}, seed={seed or '<default>'}, exhaustiveness={exhaustiveness or '<default>'}")

        experiment = Experiment(workdir, experiment_name, mgltools, caverdock)
        # The following lines are optional, they will copy the inputs to the experiment directory
        experiment.store_input_file(receptor_path)
        experiment.store_input_file(ligand_path)
        experiment.store_input_file(tunnel_path)

Prepare the receptor and ligand pdbqt files, discretize the tunnel. You can extend the tunnel if needed (use correct indentation).

receptor = Receptor(receptor_path) if receptor_path.endswith(".pdbqt") else experiment.prepare_receptor(receptor_path)
ligand = Ligand(ligand_path) if ligand_path.endswith(".pdbqt") else experiment.prepare_ligand(ligand_path)

discretized_tunnel = experiment.discretize_tunnel(tunnel_path, delta=dictval(tunnel_def, "discretization_delta", 0.3, float))

# Extend tunnel
extended_tunnel = discretized_tunnel
ext_distance = dictval(tunnel_def, ["extension", "distance"], 2, float)
if ext_distance > 0:
    extended_tunnel = discretized_tunnel + discretized_tunnel.extend(
        distance=ext_distance, step=dictval(tunnel_def, ["extension", "step"], 0.2, float),
    )

Run CaverDock (use correct indentation).

trajectory = experiment.run_caverdock(
    ligand, receptor, extended_tunnel, direction, trajectory_type,
    seed=seed,
    catomnum=dictval(ligand_def, "drag_atom", fn=int),
    exhaustiveness=exhaustiveness
)

Create the energy profile file from the pdbqt file and gather all the results (use correct indentation).

eprofile = trajectory.energy_profile
eprofile.write_to_dat(experiment.result_path("profile.dat"))

disc_ranges_vals = dictval(tunnel_def, "disc_ranges")
disc_ranges = None
if disc_ranges_vals:
    disc_ranges = DiscRanges(
        disc_ranges_vals["bound"][0], disc_ranges_vals["bound"][1],
        disc_ranges_vals["max"][0], disc_ranges_vals["max"][1],
        disc_ranges_vals["surface"][0], disc_ranges_vals["surface"][1],
        dictval(screening, "type", DiscRanges.type, lambda x: BoundaryType[x.upper()])
    )
analysis = eprofile.analyse(trajectory_type=trajectory_type, disc_ranges=disc_ranges)

# Gather results
af = convert_eprofile_analysis(experiment_name, analysis, receptor_path, tunnel_path, ligand_path)
af.to_csv(experiment.result_path("analysis.csv"), index=False)

analysis_data = pd.concat((analysis_data, af))
plots.append(EnergyProfilePlot(experiment_name, eprofile, trajectory_type))
current_experiment_num += 1

Generate the simplified report and draw plots.

analysis_data.to_csv(os.path.join(workdir.directory, "results.csv"), index=False)
log.info("Plotting results")
plotconf = dictval(batch, "plot", {})
plot_params = {
    "share_axes": dictval(plotconf, "share_axes", fn=bool),
    "force_active_site_location": dictval(plotconf, "active_site_location", fn=lambda x: ActiveSiteLocation[x.upper()]),
    "lowerbound_color": dictval(plotconf, "lowerbound_color"),
    "upperbound_color": dictval(plotconf, "upperbound_color"),
    "radius_color": dictval(plotconf, "radius_color"),
    "max_plots_per_row": dictval(plotconf, "plots_per_row"),
    "show_zero_energy_line": dictval(plotconf, "show_zero_energy_line"),
    "zero_energy_line_color": dictval(plotconf, "zero_energy_line_color")
}
plot_params = {k: v for k, v in plot_params.items() if v is not None}
plot_results(plots, output=os.path.join(workdir.directory, "out.png"), **plot_params)

Complete script for download: api-example-batch.py

Example case data: test-batch.tar.gz