Python API examples
Python API single calculation
On this page, we will show how to use the Python API to set up a single CaverDock calculation. The user needs to prepare the input files beforehand - protein structure, ligand molecule and tunnel. Bellow, we show the individual parts of the script.
First, the user must import all the needed modules.
import logging
import os
import pandas as pd
from pycaverdock import ActiveSiteLocation, CaverDock, cfggen, Direction, DiscRanges, EnergyProfilePlot, Ligand, MGLToolsWrapper, Receptor, load_tunnel, discretize_tunnel, TrajectoryType, plot_results, CaverDockTrajectory, generate_caverdock_config, BoundaryType
from pycaverdock.experiment import Experiment, Workdir, convert_eprofile_analysis
from pycaverdock.utils import dictval, get_basename
Set the work directory and location of CaverDock binary and MGLTools. The location of the programs will be different for the Apptainer container and local binaries.
WORKDIR_PATH = "work-single"
MGLTOOLS_HOME_PATH = None
CAVERDOCK_BINARY_PATH = None
DEBUG = False
logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO, format="[%(asctime)s] %(message)s")
log = logging.getLogger("cd-single-calculation")
mgltools = MGLToolsWrapper("/home/user/Documents/MGLtools/MGLTools-1.5.7rc1/")
caverdock = CaverDock("/home/user/Documents/CaverDock/local_bin/caverdock")
Set important parameters and the location of input files.
batch_name = 'Single_calculation'
log.info(f"Running batch {batch_name}")
direction = Direction.OUT
trajectory_type = TrajectoryType.LOWERBOUND
analysis_data = pd.DataFrame()
plots = []
experiment_log = []
receptor_path = "1K63-a.pdb"
ligand_path = "m004.pdb"
tunnel_path = "tun_cl_003_1.pdb"
tunnel_object = load_tunnel(tunnel_path)
title = f"r{get_basename(receptor_path)}-l{get_basename(ligand_path)}-t{get_basename(tunnel_path)}-d{direction.name.lower()}-{trajectory_type.name.lower()}"
log.info(f"Running {title}")
log.info(f"Experiment settings: receptor={receptor_path}, tunnel={tunnel_path}, ligand={ligand_path}, direction={direction.name}, trajectory_type={trajectory_type.name}, seed={seed or '<default>'}, exhaustiveness={exhaustiveness or '<default>'}")
Prepare the receptor and ligand pdbqt files, discretize the tunnel. You can extend the tunnel if needed.
receptor = Receptor(receptor_path) if receptor_path.endswith(".pdbqt") else mgltools.prepare_receptor(receptor=receptor_path, output_path=get_basename(receptor_path)+".pdbqt")
print(receptor)
ligand = Ligand(ligand_path) if ligand_path.endswith(".pdbqt") else mgltools.prepare_ligand(ligand=ligand_path, output_path=get_basename(ligand_path)+".pdbqt")
discretized_tunnel = discretize_tunnel(tunnel=tunnel_object, delta=0.3)
#print(discretized_tunnel)
extended_tunnel = discretized_tunnel
ext_distance = 0.5
if ext_distance > 0:
extended_tunnel = discretized_tunnel + discretized_tunnel.extend(
distance=ext_distance, step=0.2)
Run CaverDock.
trajectory = caverdock.run(
directory=".",
name="cd-single",
ligand=ligand.path,
receptor=receptor.path,
tunnel=extended_tunnel,
direction=direction,
trajectory_type=trajectory_type,
mpi_processes=4,
seed=2,
catomnum=0,
exhaustiveness=1
)
Create the energy profile file from the pdbqt file, generate the simplified report and draw plots.
eprofile = trajectory.energy_profile
eprofile.write_to_dat("profile.dat")
disc_ranges = DiscRanges(0,0.5,0.1,1,0.99,1,BoundaryType.FRACTION)
analysis = eprofile.analyse(trajectory_type=trajectory_type, disc_ranges=disc_ranges)
af = convert_eprofile_analysis(title, analysis, receptor_path, tunnel_path, ligand_path)
af.to_csv("analysis.csv", index=False)
analysis_data = pd.concat((analysis_data, af))
plots.append(EnergyProfilePlot(title, eprofile, trajectory_type))
analysis_data.to_csv(os.path.join("results.csv"), index=False)
log.info("Plotting results")
plot_results(plots, output=os.path.join("out.png"), share_axes=True)
Complete script for download: api-example-single.py
Example case data: test-single.tar.gz
Python API batch calculation
Here we will show how to use the Python API to set up batch calculations with CaverDock. The user needs to prepare the input files beforehand - protein structure, ligand molecule and tunnel. Furthermore, the user must prepare a yaml file which references all the input files to create all the possible combinations for the batch calculation based on the definition of the batch calculation. The workflow is similar to the calculations run with cd-screening script from the package with the difference, the parameters can be set either in the yaml file or in the custom script for the batch calculation. If the additional parameters are set in the yaml file and in the custom script the values for the parameters will be overridden by the settings in the custom script. In comparison with the single calculation, we will use a higher level API class experiment which loads the yaml file, runs all the calculations in a loop and takes care of most of the needed settings automatically. Be careful of correct indentation in the script. In comparison with screening the receptor tunnel pairs must be defined manually in the yaml file. The $product is related to the product of the combinations created with the cfggen library, not to any biological product.
receptors_1:
- proteins/01/1K63-a.pdb
tunnels_1:
- path: proteins/01/tun_cl_001_1.pdb
discretization_delta: 0.3
extension:
distance: 2
step: 0.2
- proteins/01/tun_cl_003_1.pdb
ligands_1:
- path: ligands/m004.pdb
drag_atom: 1
- ligands/m006.pdb
- ligands/m018.pdb
receptor_pairs:
$+:
- $product:
receptor:
$ref: receptors_1
tunnel:
$ref: tunnels_1
batches:
- name: Batch example
inputs:
$product:
receptor_pair:
$ref: receptor_pairs
ligand:
$ref: ligands_1
Example yaml files: example_batch_experiment.yaml, batches.yaml
Bellow, we show the individual parts of the script.
First, the user must import all the needed modules.
import logging
import os
import pandas as pd
from pycaverdock import ActiveSiteLocation, BoundaryType, CaverDock, Direction, DiscRanges, EnergyProfilePlot, Ligand, MGLToolsWrapper, Receptor, TrajectoryType, cfggen, plot_results
from pycaverdock.experiment import Experiment, Workdir, convert_eprofile_analysis
from pycaverdock.utils import dictval, get_basename
Set the work directory and location of CaverDock binary and MGLTools.
WORKDIR_PATH = "work"
EXPERIMENT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "batches.yaml")
MGLTOOLS_HOME_PATH = None
CAVERDOCK_BINARY_PATH = None
DEBUG = False
logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO, format="[%(asctime)s] %(message)s")
log = logging.getLogger("cd-batch")
mgltools = MGLToolsWrapper(MGLTOOLS_HOME_PATH)
caverdock = CaverDock(CAVERDOCK_BINARY_PATH)
Set the location of the yaml file and define the experiments.
batches = cfggen.build_template_from_file(EXPERIMENT_FILE)["batches"]
for batch in batches:
batch_name = batch['name']
log.info(f"Running batch {batch_name}")
workdir = Workdir(os.path.join(WORKDIR_PATH, dictval(batch, "dir", batch_name.replace(" ", "_"))))
direction = dictval(batch, "direction", Direction.OUT, lambda x: Direction[x.upper()])
trajectory_type = dictval(batch, "trajectory_type", TrajectoryType.LOWERBOUND, lambda x: TrajectoryType[x.upper()])
exhaustiveness = dictval(batch, "exhaustiveness", fn=int)
seed = dictval(batch, "seed", fn=int)
analysis_data = pd.DataFrame()
plots = []
experiment_log = []
batch_inputs = batch["inputs"]
total_experiment_num = len(batch_inputs)
current_experiment_num = 1
for (index, inputs) in enumerate(batch["inputs"]):
receptor_path = inputs["receptor_pair"]["receptor"]
ligand_def = inputs["ligand"]
ligand_path = ligand_def if type(ligand_def) == str else ligand_def["path"]
tunnel_def = inputs["receptor_pair"]["tunnel"]
tunnel_path = tunnel_def if type(tunnel_def) == str else tunnel_def["path"]
experiment_name = f"r{get_basename(receptor_path)}-l{get_basename(ligand_path)}-t{get_basename(tunnel_path)}-d{direction.name.lower()}-{trajectory_type.name.lower()}"
log.info(f"Running experiment {experiment_name} [{current_experiment_num}/{total_experiment_num}]")
log.info(f"Experiment settings: receptor={receptor_path}, tunnel={tunnel_path}, ligand={ligand_path}, direction={direction.name}, trajectory_type={trajectory_type.name}, seed={seed or '<default>'}, exhaustiveness={exhaustiveness or '<default>'}")
experiment = Experiment(workdir, experiment_name, mgltools, caverdock)
# The following lines are optional, they will copy the inputs to the experiment directory
experiment.store_input_file(receptor_path)
experiment.store_input_file(ligand_path)
experiment.store_input_file(tunnel_path)
Prepare the receptor and ligand pdbqt files, discretize the tunnel. You can extend the tunnel if needed (use correct indentation).
receptor = Receptor(receptor_path) if receptor_path.endswith(".pdbqt") else experiment.prepare_receptor(receptor_path)
ligand = Ligand(ligand_path) if ligand_path.endswith(".pdbqt") else experiment.prepare_ligand(ligand_path)
discretized_tunnel = experiment.discretize_tunnel(tunnel_path, delta=dictval(tunnel_def, "discretization_delta", 0.3, float))
# Extend tunnel
extended_tunnel = discretized_tunnel
ext_distance = dictval(tunnel_def, ["extension", "distance"], 2, float)
if ext_distance > 0:
extended_tunnel = discretized_tunnel + discretized_tunnel.extend(
distance=ext_distance, step=dictval(tunnel_def, ["extension", "step"], 0.2, float),
)
Run CaverDock (use correct indentation).
trajectory = experiment.run_caverdock(
ligand, receptor, extended_tunnel, direction, trajectory_type,
seed=seed,
catomnum=dictval(ligand_def, "drag_atom", fn=int),
exhaustiveness=exhaustiveness
)
Create the energy profile file from the pdbqt file and gather all the results (use correct indentation).
eprofile = trajectory.energy_profile
eprofile.write_to_dat(experiment.result_path("profile.dat"))
disc_ranges_vals = dictval(tunnel_def, "disc_ranges")
disc_ranges = None
if disc_ranges_vals:
disc_ranges = DiscRanges(
disc_ranges_vals["bound"][0], disc_ranges_vals["bound"][1],
disc_ranges_vals["max"][0], disc_ranges_vals["max"][1],
disc_ranges_vals["surface"][0], disc_ranges_vals["surface"][1],
dictval(screening, "type", DiscRanges.type, lambda x: BoundaryType[x.upper()])
)
analysis = eprofile.analyse(trajectory_type=trajectory_type, disc_ranges=disc_ranges)
# Gather results
af = convert_eprofile_analysis(experiment_name, analysis, receptor_path, tunnel_path, ligand_path)
af.to_csv(experiment.result_path("analysis.csv"), index=False)
analysis_data = pd.concat((analysis_data, af))
plots.append(EnergyProfilePlot(experiment_name, eprofile, trajectory_type))
current_experiment_num += 1
Generate the simplified report and draw plots.
analysis_data.to_csv(os.path.join(workdir.directory, "results.csv"), index=False)
log.info("Plotting results")
plotconf = dictval(batch, "plot", {})
plot_params = {
"share_axes": dictval(plotconf, "share_axes", fn=bool),
"force_active_site_location": dictval(plotconf, "active_site_location", fn=lambda x: ActiveSiteLocation[x.upper()]),
"lowerbound_color": dictval(plotconf, "lowerbound_color"),
"upperbound_color": dictval(plotconf, "upperbound_color"),
"radius_color": dictval(plotconf, "radius_color"),
"max_plots_per_row": dictval(plotconf, "plots_per_row"),
"show_zero_energy_line": dictval(plotconf, "show_zero_energy_line"),
"zero_energy_line_color": dictval(plotconf, "zero_energy_line_color")
}
plot_params = {k: v for k, v in plot_params.items() if v is not None}
plot_results(plots, output=os.path.join(workdir.directory, "out.png"), **plot_params)
Complete script for download: api-example-batch.py
Example case data: test-batch.tar.gz