Benchstab client
Client
- class benchstab.client.BenchStab(input_file: str | DataFrame, outfolder: str | None = None, predictor_config: Dict[str, Any] | None = None, include: List[str] | None = None, exclude: List[str] | None = None, allow_struct_predictors: bool = True, allow_sequence_predictors: bool = True, verbosity: int = 0, permissive: bool = False, *args, **kwargs)[source]
Bases:
objectThe BenchStab class is responsible for managing the predictors. It selects the predictors based on the input data and runs them. The results are returned as a pandas DataFrame. If an outfolder is specified, the results will be saved there as a CSV file.
- sequence_predictors = [IMutant2, IMutant3, INPS, iStable, DDGun, Mupro, SAAFEC, PONSol2, Prostata]
- pdbid_predictors = [IMutant2, IMutant3, CUPSAT, AutoMute, INPS, DUET, iStable, DDGun, SDM, Maestro, PremPS, SRide, PoPMuSiC, DDMut, Dynamut2]
- pdbfile_predictors = [IMutant3, DUET, DDGun, mCSM, Mupro, SDM, PremPS, SRide, DDMut, Dynamut2]
- filter_predictors()[source]
Filter the predictors based on the include and exclude lists. If the include list is specified, only the predictors in the include list will be selected. If the exclude list is specified, the predictors in the exclude list will be removed from the list of predictors. If both lists are specified, the exclude list will be ignored.
- __map(row)
Map the predictors to the input data. If the input data is a Fasta object, only sequence predictors will be selected. If the input data is a Pdb object, only structure predictors will be selected. If the input data is a PdbFile object, both sequence and structure predictors will be selected.
- Parameters:
row (pandas.Series) – A row of the input data.
- Returns:
A list of selected predictors.
- Return type:
list
- save_results(results: DataFrame)[source]
Save the results to a CSV file in the outfolder.
- Parameters:
results (pandas.DataFrame) – The results to be saved.
- async __periodically_gather_results()
Gather the results from the predictors. The results are concatenated and saved to a CSV file in the outfolder.
- async __run()
Run the predictors asynchronously and gather the results. The results are returned as a pandas DataFrame.
Returns: :return: The results as a pandas DataFrame. :rtype: pandas.DataFrame
Preprocessor
- class benchstab.preprocessor.PreprocessorRow(identifier: benchstab.utils.structure.PDB | benchstab.utils.structure.Fasta = None, mutation: str = None, chain: str = None, fasta: benchstab.utils.structure.Fasta = None, fasta_mutation: str = None, ph: float = 7.0, temperature: float = 25.0)[source]
Bases:
object- mutation: str = None
- chain: str = None
- fasta_mutation: str = None
- ph: float = 7.0
- temperature: float = 25.0
- class benchstab.preprocessor.Preprocessor(input: str | list | TextIO, outfolder: str | None = None, permissive: bool = True, verbosity: int = 0, skip_header: bool = False, *args, **kwargs)[source]
Bases:
objectPreprocessor class is used to parse the input file and create a dataset that can be used by the predictor.
- The input file can be in the following formats:
PDB identifier, mutation and chain
Fasta identifier, mutation, pH and temperature
PDB identifier, mutation, pH and temperature
Fasta identifier, mutation and chain
The class can also generate a summary of the dataset.
- The summary includes the following information:
Number of mutations
Number of proteins
Average number of mutations per protein
Number of mutations with positive charge
Number of mutations with negative charge
Number of mutations with no charge
Number of mutations with acidic chemical properties
Number of mutations with basic chemical properties
Number of mutations with aromatic chemical properties
Number of mutations with aliphatic chemical properties
Number of mutations with hydroxyl chemical properties
Number of mutations with sulfur chemical properties
Number of mutations with amide chemical properties
Number of mutations with non-polar chemical properties
Number of mutations with polar chemical properties
- Parameters:
input (Union[str, list, TextIO]) – Input file containing the protein identifier, mutation and chain
outfolder (str) – Folder where the preprocessed input will be saved
permissive (bool) – If True, the preprocessing script will continue if it encounters an error
verbosity (int) – Verbosity level
skip_header (bool) – If True, the header in the input file will be skipped
- logger = <Logger benchstab.preprocessor (INFO)>
- classmethod print_summary(summary, logger: Logger | None = None)[source]
Print the summary generated by create_summary to stdout using the provided logger or the default logger.
- Parameters:
summary (Dict[str, str]) – Summary to be printed
logger (logging.Logger) – Logger to be used for printing the summary
- Returns:
None
- Return type:
None
- classmethod create_summary(data: PredictorDataset, verbose: bool = True, outfolder: str | None = None, logger: Logger | None = None) Dict[str, str][source]
Create a summary of the dataset.
- The summary includes the following information:
Number of mutations
Number of proteins
Average number of mutations per protein
Number of mutations with positive charge
Number of mutations with negative charge
Number of mutations with no charge
Number of mutations with acidic chemical properties
Number of mutations with basic chemical properties
Number of mutations with aromatic chemical properties
Number of mutations with aliphatic chemical properties
Number of mutations with hydroxyl chemical properties
Number of mutations with sulfur chemical properties
Number of mutations with amide chemical properties
Number of mutations with non-polar chemical properties
Number of mutations with polar chemical properties
- Parameters:
data (PredictorDataset) – Dataset to be summarized
verbosity (bool) – If True, the summary will be printed to stdout
outfolder (str) – If provided, the summary will be saved to a file in the provided folder
logger (logging.Logger) – Logger to be used for printing the summary
- Returns:
Dictionary with the summary
- Return type:
Dict[str, str]
- parse_fasta(data: str) PreprocessorRow[source]
Parse the line containing the fasta identifier and the mutation.
- If the fasta identifier is valid, return:
Fasta object.
mutation in the format WT_RESIUDE + POSITION + MUT_RESIDUE.
pH (default: 7.0 if not supplied).
temperature (default: 25.0 if not supplied).
- Parameters:
data (str) – Line containing the fasta identifier and the mutation
- Returns:
Dictionary containing the fasta object, mutation, pH and temperature
- Return type:
- extract_fasta_from_pdb(identifier: PDB, chain: str, source: str) Fasta[source]
Extract fasta record from PDB file.
- parse_fasta_mutation(mutation: str, fasta: Fasta, permissive: bool = True) str[source]
Wraps the __parse_fasta_mutation function in a try/except block. If the mutation string is not valid, the function will raise a PreprocessorError with the permissive flag set to True. This flag indicates that the error is not critical and that the preprocessing script can continue.
- Parameters:
mutation (str) – Mutation string to be parsed
fasta (Fasta) – Fasta record
- Returns:
Mutation string in the format WT_RESIUDE + POSITION + MUT_RESIDUE
- Return type:
str
- parse_struct(data: List[str]) PreprocessorRow | None[source]
Parse the line containing the protein identifier, the mutation and the chain.
- If the protein identifier is valid, return:
PDB object
mutation in the format WT_RESIUDE + POSITION + MUT_RESIDUE
chain
fasta object
pH (default: 7.0 if not supplied)
temperature (default: 25.0 if not supplied)
- Parameters:
data (List[str]) – Line containing the protein identifier, the mutation and the chain
- Returns:
Dictionary containing the PDB object, mutation, chain, fasta object, pH and temperature
- Return type:
- parse_mutation(mutation: str) str[source]
Parse the mutation string and check if it is valid.
- Parameters:
mutation (str) – Mutation string to be parsed
- Returns:
Mutation string in the format WT_RESIUDE + POSITION + MUT_RESIDUE
- Return type:
str
- __parse_fasta_mutation(mutation: str, fasta: Fasta, permissive: bool = False) str
Parse the mutation string and check if it is valid. If the mutation is valid, return the mutation string in the format WT_RESIUDE + POSITION + MUT_RESIDUE.
As this function also handles the parsing of the mutation string for the fasta record extracted from PDBs, it is possible that the mutation string is not valid. In this case, the function will raise a PreprocessorError with the permissive flag set to True. This flag indicates that the error is not critical and that the preprocessing script can continue.
- Parameters:
mutation (str) – Mutation string to be parsed
fasta (Fasta) – Fasta record
permissive (bool) – If True, the function will raise a PreprocessorError with the permissive flag set to True
- Returns:
Mutation string in the format WT_RESIUDE + POSITION + MUT_RESIDUE
- Return type:
str
- parse_line(line: str, sep: str | None = None) PreprocessorRow | None[source]
Parse the line containing the protein identifier and mutation (chain).
- If the protein identifier is valid, return:
PDB object
mutation in the format WT_RESIUDE + POSITION + MUT_RESIDUE
chain
fasta object
pH (default: 7.0 if not supplied)
temperature (default: 25.0 if not supplied)
- Parameters:
line (str) – Line containing the protein identifier, the mutation and the chain
sep (str) – Column separator
- Returns:
Dictionary containing the PDB/Fasta object, mutation, chain, fasta object, pH and temperature
- Return type:
Union[PreprocessorRow, None]
- __exception_wrapper(func: callable, *args, **kwargs)
Wrap the function call in a try/except block. If the function raises a PreprocessorError or FileNotFoundError, the function will return None and the error will be logged. If the function raises any other exception, the exception will be raised.
- Parameters:
func (callable) – Function to be wrapped
- Returns:
Function result or None
- Return type:
Union[None, Any]
- parse() PredictorDataset[source]
Initiates the mutation file parsing process.
Base Predictor
- class benchstab.predictors.base.PredictorFlags(webkit: bool = False, group_mutations: bool = False, group_mutations_by: list[str] = <factory>, mutation_delimiter: str = ', ')[source]
Bases:
objectClass for storing predictor flags. The flags are used to control the behaviour of the predictor.
- webkit: bool = False
- group_mutations: bool = False
- group_mutations_by: list[str]
- mutation_delimiter: str = ','
- class benchstab.predictors.base.BaseCredentials(username: str = '', password: str = '', email: str = '', url: str = '')[source]
Bases:
objectBase class for predictor credentials. The credentials are used to authenticate the user. The credentials are stored in a dictionary and sent as a POST request to the url specified in the credentials. The credentials class variable should be overwritten by the child class.
- username: str = ''
- password: str = ''
- email: str = ''
- url: str = ''
- class benchstab.predictors.base.PredictorHeader(name: str = '', input_type: str = '', classname: str = '', mutation_column: str = 'mutation')[source]
Bases:
objectClass for storing predictor headers. The predictor headers are used to identify the predictor.
- name: str = ''
- input_type: str = ''
- classname: str = ''
- mutation_column: str = 'mutation'
- class benchstab.predictors.base.BasePredictor(data: PredictorDataset, flags: PredictorFlags | None = None, outfolder: str | None = None, username: str = '', email: str = 'generic@email.com', password: str = '', wait_interval: int = 60, batch_size: int = -1, verbosity: int = 0, *args, **kwargs)[source]
Bases:
object- Base class for predictors. The class is responsible for the following:
Sending the query to the predictor.
Retrieving the results from the predictor.
Aggregating the results.
Returning the results as a PredictorDataset.
- url = ''
- aggr_columns = {'chain', 'fasta_mutation', 'mutation'}
- credentials
alias of
BaseCredentials
- async classmethod is_available_async(url: str) str[source]
Check if the predictor is available. This is done by sending a GET request to the specified url. If the request is successful, the predictor is available.
- Parameters:
url (str) – url of the predictor
- Returns:
status of the predictor. ‘Available’ if the predictor is available, ‘Offline’ otherwise
- Return type:
str
- classmethod is_available(url: str) str[source]
Check if the predictor is available. This is done by sending a GET request to the specified url. If the request is successful, the predictor is available.
- Parameters:
url (str) – url of the predictor
- Returns:
status of the predictor. ‘Available’ if the predictor is available, ‘Offline’ otherwise
- Return type:
str
- classmethod header()[source]
Return the header of the predictor. The header is used to identify the predictor.
- Returns:
predictor header
- Return type:
- async classmethod async_default_callback(index: int, response: ClientResponse, session: ClientSession)[source]
Default callback function for the GET request. It checks if the request was successful and updates the status of the row accordingly.
- Parameters:
index (int) – index of the row
response (aiohttp.ClientResponse) – response of the GET request
session (aiohttp.ClientSession) – aiohttp session
- Returns:
True if the request was successful, False otherwise
- Return type:
bool
- format_mutation(data: str | Dict | DatasetRow) str[source]
Format the mutation to the format required by the predictor. This function should be implemented by the child class.
- Parameters:
data (Union[str, Dict, DatasetRow]) – mutation
- Returns:
formatted mutation
- Return type:
str
- prepare_mutation(row: DatasetRow) str[source]
Prepare the mutation to be sent to the predictor.
- This function performs the following steps:
Convert the mutation to the format required by the predictor.
Group the mutations if needed.
- Parameters:
row (DatasetRow) – row of the dataset
- Returns:
mutation
- Return type:
str
- async send_query(session: ClientSession, index: int, *args, **kwargs) bool[source]
Send the query to the predictor. This function should be implemented by the child class.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
- Returns:
True if the query was sent successfully, False otherwise
- Return type:
bool
- async retrieve_result(session: ClientSession, index: int) bool[source]
Retrieve the results of the prediction. This function should be implemented by the child class.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
- Returns:
True if the prediction was successful, False otherwise
- Return type:
bool
- __prepare_payload(row: DatasetRow) Dict
Wrapper around the prepare_payload function. It catches any exceptions and updates the status of the row accordingly.
- Parameters:
row (DatasetRow) – row of the dataset
- Returns:
payload
- Return type:
dict
- prepare_payload(row: DatasetRow) Dict[source]
Prepare the payload to be sent to the predictor. This function should be implemented by the child class.
- Parameters:
row (DatasetRow) – row of the dataset
- Returns:
payload
- Return type:
dict
- get_results() PredictorDataset[source]
Get the results of the prediction.
- Returns:
prediction results
- Return type:
- _aggregate(data) List[Dict[Any, Any]][source]
Helper function aggregating the data into a list of dictionaries.
- Parameters:
data (PredictorDataset) – data to be aggregated
- Returns:
aggregated data
- Return type:
list[DatasetRow]
- setup() None[source]
Set up the dataset. This includes grouping mutations, creating the payload, etc.
- async __exception_wrapper(func: Callable, index: int | None = None, *args, **kwargs) bool
Wrapper around the async functions. It catches any exceptions and updates the status of the row accordingly. If the exception is HTMLParserError with permissive=True, it returns False, otherwise it returns True.
- Parameters:
func (Callable) – function to be executed
index (int) – index of the row
- Returns:
True if the function was executed successfully, False otherwise
- Return type:
bool
- async login(session: ClientSession, index: int, login_extra: Dict[str, Any] | None = None) bool[source]
Login to the predictor. This function should be implemented by the child class.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
login_extra (dict) – extra parameters for the login function
- Returns:
True if the login was successful, False otherwise
- Return type:
bool
- async _queue_prediction(queue)[source]
Create a queue of tasks to be executed in parallel. The queue is created from the indices of the dataset. The queue is filled until it reaches the batch_size. If the queue is full, the function waits for the queue to be emptied. If the dataset is exhausted, the function returns.
- Parameters:
queue (asyncio.Queue) – queue of tasks
- async compute()[source]
The main function of the predictor.
- It is responsible for the following:
Check if the predictor is available (if not, return immediately).
Set up the dataset (group mutations, etc.)
Create a queue of tasks to be executed in parallel.
Create a queue of workers to execute the tasks.
Wait for all tasks to be completed (join the queue).
Return the results as a PredictorDataset.
- Returns:
prediction results
- Return type:
- async _run_prediction(queue)[source]
Run the prediction. This function is executed in parallel by the workers.
- It takes an index from the queue and executes the following steps:
Login to the predictor.
Send the query to the predictor.
Retrieve the results from the predictor.
- Parameters:
queue (asyncio.Queue) – queue of tasks
- make_form(payload)[source]
Create a multipart form from a dictionary of parameters. Since the current version (==3.8.5) of aiohttp does not support assigning a custom boundary to the FormData object directly, we need to create a custom MultipartWriter and assign it to the FormData object.
- Parameters:
payload (dict) – dictionary of parameters
- Returns:
multipart form
- Return type:
aiohttp.FormData
- async __get(session: ClientSession, callback: Callable, index: int, *args, **kwargs) bool
Wrapper around the aiohttp GET request. It catches any exceptions and updates the status of the row accordingly.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
callback (Callable) – callback function
index (int) – index of the row
- Returns:
result of the callback function
- Return type:
bool
- async get(session: ClientSession, dataset: DatasetRow | Dict, callback: Callable, index: int | None = None, *args, **kwargs) bool[source]
Send a GET request to the predictor. The request is sent to the url specified in the dataset. The response is handled by the callback function. The callback function should return True if the request was successful, False otherwise. If the callback function is not specified, the default callback function is used.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
dataset (Union[DatasetRow, Dict]) – dataset
callback (Callable) – callback function
index (int) – index of the row
- Returns:
result of the callback function
- Return type:
bool
- async __post(session: ClientSession, callback: Callable, index: int, *args, **kwargs) bool
Wrapper around the aiohttp POST request. It catches any exceptions and updates the status of the row accordingly.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
callback (Callable) – callback function
index (int) – index of the row
- Returns:
result of the callback function
- Return type:
bool
- async post(session: ClientSession, dataset: DatasetRow | Dict, callback: Callable, index: int | None = None, *args, **kwargs) bool[source]
Send a POST request to the predictor. The request is sent to the url specified in the dataset. The response is handled by the callback function. The callback function should return True if the request was successful, False otherwise. If the callback function is not specified, the default callback function is used.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
dataset (Union[DatasetRow, Dict]) – dataset
callback (Callable) – callback function
index (int) – index of the row
- Returns:
result of the callback function
- class benchstab.predictors.base.BasePostPredictor(data: PredictorDataset, flags: PredictorFlags | None = None, outfolder: str | None = None, username: str = '', email: str = 'generic@email.com', password: str = '', wait_interval: int = 60, batch_size: int = -1, verbosity: int = 0, *args, **kwargs)[source]
Bases:
BasePredictorBase class for predictors that require a POST request. The POST request is sent to the url specified in the dataset. The response is handled by the default_post_handler.
- async send_query(session: ClientSession, index: int, *args, **kwargs) bool[source]
Send the query to the predictor. If the predictor is a form-data predictor, the query is sent as a multipart form. Otherwise, it is sent as a JSON object.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
- Returns:
True if the query was sent successfully, False otherwise
- Return type:
bool
- async default_post_handler(index: int, response: ClientResponse, session: ClientSession)[source]
Default callback function for the POST request. It checks if the request was successful and updates the status of the row accordingly.
- Parameters:
index (int) – index of the row
response (aiohttp.ClientResponse) – response of the POST request
session (aiohttp.ClientSession) – aiohttp session
- Returns:
True if the request was successful, False otherwise
- Return type:
bool
- class benchstab.predictors.base.BaseAuthentication(data: PredictorDataset, flags: PredictorFlags | None = None, outfolder: str | None = None, username: str = '', email: str = 'generic@email.com', password: str = '', wait_interval: int = 60, batch_size: int = -1, verbosity: int = 0, *args, **kwargs)[source]
Bases:
BasePostPredictorBase class for predictors that require authentication. The authentication is done by sending a POST request to the url specified in the credentials. The response is handled by the login_handler.
- async login(session: ClientSession, index: int, login_extra: Dict[str, Any] | None = None) bool[source]
Login to the predictor. The login is done by sending a POST request to the url specified in the credentials. The response is handled by the login_handler function. The login_handler function should return True if the login was successful, False otherwise. The function uses the credentials specified in the credentials class variable.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
login_extra (dict) – extra parameters for the login function
- Returns:
True if the login was successful, False otherwise
- Return type:
bool
- async login_handler(index: int, response: ClientResponse, session: ClientSession) bool[source]
Default callback function for the login request. It checks if the login was successful and updates the status of the row accordingly.
- Parameters:
index (int) – index of the row
response (aiohttp.ClientResponse) – response of the login request
session (aiohttp.ClientSession) – aiohttp session
- Returns:
True if the login was successful, False otherwise
- Return type:
bool
- class benchstab.predictors.base.BaseGetPredictor(max_retries: int = 100, *args, **kwargs)[source]
Bases:
BasePostPredictorBase class for predictors that require a GET request. The GET request is sent to the url specified in the dataset. The response is handled by the default_get_handler.
- async retrieve_result(session: ClientSession, index: int) bool[source]
Retrieve the results of the prediction. The results are retrieved by sending a GET request to the url specified in the dataset. IF the datapoint is already processed, the function returns True, otherwise it returns the result of the default_get_handler function.
- Parameters:
session (aiohttp.ClientSession) – aiohttp session
index (int) – index of the row
- Returns:
True if the request was successful, False otherwise
- Return type:
bool
- async default_get_handler(index: int, response: ClientResponse, session: ClientSession) bool[source]
Default callback function for the GET request. It checks if the request was successful and updates the status of the row accordingly.
- Parameters:
index (int) – index of the row
response (aiohttp.ClientResponse) – response of the GET request
session (aiohttp.ClientSession) – aiohttp session
- Returns:
True if the request was successful, False otherwise
- Return type:
bool