Commit ac69a9e6 by Christian Margreitter

Version 1.0.0

parents
__pycache__
*.pyc
package.json
.vscode
tags
.idea
tests/junk
!tests/junk/desc.txt
dockstream/config/tests_config/config.json
dockstream.log
dockstream.err
.directory
# Changelog of `DockStream`
## 1.0.0 - 2021-08-02 (RC)
### Added
- Minimal environment definitions.
### Fixed
- Fixed bug in `Schrodinger`/`LigPrep` sublist generation ('0' was interpreted as '1').
### Internal
- Added additional unit test for `Schrodinger`/`LigPrep` CLI application.
## 0.2.2 - 2020-04-30
### Added
- Extended input overwrite capabilities for `docker.py` (CSV specification).
- New parsing capability to load SDF input with `DockStream` nomenclature to ensure enumerations are properly assigned.
- Exposed additional command-line parameters for `Schrodinger`/`LigPrep`.
- Added execution of `Schrodinger`/LigPrep` on AWS ("relative paths" bug on Schrodinger's side - workaround).
### Fixed
- Hotfix to improve `Corina` result parsing.
### Internal
- Show-case of how to execute `Schrodinger`/`LigPrep` on AWS including a token guard.
- Show-case of how to execute `Schrodinger`/`Glide` on AWS including a token guard.
## 0.2.1 - 2021-04-16
### Added
- Additional measures to protect agains failing ligands at the embedding stage.
- Added `OpenEye`/`OMEGA` embedder.
### Fixed
- Hotfix to increase stability of `RDkit` ligand preparator.
### Internal
- Added unit test to cover changes in parameters for `LigPrep` when using `EPIK`.
- Improved logging for embedding stage.
- Reverted file system handling for `Glide`.
## 0.2.0 - 2021-03-31
### Added
- New backend `Hybrid` (`OpenEye`) added (will replace the API version shortly).
- Added stereo-isomer support using `RDkit`.
- Added stereo-isomer support using `Corina`.
- Exposed `Corina`'s "-d" options.
- Added support for `Ligprep`'s filtering capabilities.
- Enhanced stability for parallel execution mode for `Hybrid`, `Gold` and `Glide`.
- Benchmarking script added.
- Analysis script added.
### Fixed
- Fixed instability with protonations when using `RDkit` ligand embedding.
- Fixed logging bug within `Ligprep`/`Schrodinger`.
- Hot-fix for problem with CSV write-out when no pose was accepted.
- Improved logging for `Ligprep`/`Schrodinger`.
### Internal
- Introduced stereo-enumeration factory class.
- Adde `pydantic`-based parsing.
- Optimized log messages.
- Removed most versions from environment specifications.
## 0.1.4 - 2020-11-03
### Added
- Added support for parallelization for `Ligprep`/`Schrodinger` embedding.
- Added `best_per_ligand` write-out mode for conformers.
- Added CSV-option for entry point `sdf2smiles.py`.
### Fixed
- Fixed bug for `OpenEye` backend, when `max_compounds_per_subjob` was set to 0.
- Fixed bug in `best_per_enumeration` write-out mode for CSV results.
- Fixed irregularity with `Gold` conformer ordering (fitness versus score).
- Fixed naming bug in `OpenEye` backend.
### Internal
- Extended `AutoDock Vina` unit tests for tautomer compounds.
- Small update of `Ligprep`/`Schrodinger` unit tests.
- Updated docking backend unit tests to fully cover enumerations.
- Added unit test for `Glide`/`Schrodinger` constraints.
## 0.1.3 - 2020-09-29
### Added
- Added `OpenBabel`/`AutoDock Vina` target preparation (generating `PDBQT` files).
- Added `OpenBabel`/`AutoDock Vina` box extraction (based on XYZ coordinate ranges of template ligand).
- Added `AutoDock Vina` backend and result parser.
- Added possibility to specify binary path for external binary executions.
### Fixed
- Fixed issue with logfile logging for `Glide` when time was exceeded.
- Fixed issue with `OpenBabel` binary call when environment was not loaded.
### Internal
- Implemented hard over-write of `POSE_OUTTYPE` to be the only supported type.
- Redesign of internal dictionary usage.
- Addition of unit tests.
- Added logging the version number (and started tagging versions).
- Changed internal structure to be "package ready".
- Improved the tag adding (and extended it for the ligand preparation step).
## 0.1.2 - 2020-09-09
### Added
- Integration of a "progress bar" to docking jobs.
- Added support for parameter `max_compounds_per_subjob` to reign in (sub-)lists too long (especially with `Glide`).
- Added option to only output the best scores per enumeration.
- Added `ligprep` to available ligand embedding techniques.
### Fixed
- Fixed issues with "internal alignment" and integrated a fail-safe version.
- Made all receptor paths `lists` rather than simple strings to streamline the interface prior to the implementation of `ensemble docking`.
### Internal
- Clean-up of "OpenEye" result parser.
- Refactored result parsers.
- Improved `Gold` feedback for docking.
- Restructuring of internal "Ligand" handling.
- Refactored "docker.py" entry point.
- Added result parser output checks to respective unit tests.
- Refactored some methods of the ligand preparation tools.
- Updated example configuration files.
## 0.1.1 - 2020-08-17
### Added
- Added possibility to change the logfile path.
- Added parameter to change the time limit per compound for `Glide` docking.
- Added support to set a prefix for the output files.
- Refactored and extended tagging system for all backends (adds "smiles" and "original_smiles" now).
- Added option to only output the best poses per enumeration.
- Added transformation support (using SMIRKS and `OpenEye`) for ligand preparation.
- Added `Glide`/`Schrodinger` token guard.
- Added support of "SDF" files as input.
- Added support of arbitrary names to the parsing of CSV files as input.
- Added "-debug" parameter to entry points.
### Fixed
- Critical fix for score aggregation if enumerations were used.
- Made "Corina" embedding much more stable.
- Several minor bug fixes in the logging write-out.
### Internal
- Added "ChangeLog.md".
- Changed entry point structure (refactored and harmonized).
- Fixed issue with "min"/"max" docking scoring directions (Gold/CCDC backend).
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
# `DockStream`
![alt text](DockStream.jpg)
## Description
DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends.
Docking execution and post hoc analysis can be automated via the benchmarking and analysis workflow. The
flexilibity to specifiy a large variety of docking configurations allows tailored protocols for diverse
end applications. DockStream can also parallelize docking across CPU cores, increasing throughput.
DockStream is integrated with the de novo design platform, [REINVENT](https://github.com/MolecularAI/Reinvent),
allowing one to incorporate docking into the generative process, thus providing the agent with 3D
structural information.
## Supported Backends
### Ligand Embedders
* **[`RDKit`](https://www.rdkit.org/docs/GettingStartedInPython.html#working-with-3d-molecules)**
* **[`Corina`](https://www.mn-am.com/products/corina)**
* **[OpenEye's `OMEGA`](https://www.eyesopen.com/omega)**
* **[Schrodinger's `LigPrep`](https://www.schrodinger.com/products/ligprep)**
* **[`TautEnum`](https://github.com/OpenEye-Contrib/TautEnum/blob/master/README)**
### Docking Backends
* **[`AutoDock Vina`](http://vina.scripps.edu/index.html)**
* **[`rDock`](http://rdock.sourceforge.net)**
* **[OpenEye's `Hybrid`](https://www.eyesopen.com/oedocking-tk)**
* **[Schrodinger's `Glide`](https://www.schrodinger.com/glide)**
* **[CCDC's `GOLD`](https://www.ccdc.cam.ac.uk/solutions/csd-discovery/components/gold)**
Note, that the `CCDC` package, the `OpenEye` toolkit and `Schrodinger`'s tools require you to obtain the respective software from those vendors.
## Tutorials and Usage
Detailed `Jupyter Notebook` tutorials for all `DockStream` functionalities and workflows are provided in
[DockStreamCommunity](https://github.com/MolecularAI/DockStreamCommunity). The `DockStream` repository here
contains input `JSON` templates located in [examples](https://github.com/MolecularAI/DockStreamCommunity/examples).
The templates are organized as follows:
* `target_preparation`: Preparing targets for docking
* `ligand_preparation`: Generating 3D coordinates for ligands
* `docking`: Docking ligands
* `integration`: Combining different ligand embedders and docking backends into a single input `JSON` to run successively
## Requirements
Two Conda environments are provided: `DockStream` via `environment.yml` and `DockStreamFull` via `environment_full.yml`.
`DockStream` suffices for all use cases except when `CCDC GOLD` software is used, in which case `DockStreamFull` is required.
```
git clone <DockStream repository>
cd <DockStream directory>
conda env create -f environment.yml
conda activate DockStream
```
## Enable use of OpenEye software (from [REINVENT README](https://github.com/MolecularAI/Reinvent))
You will need to set the environmental variable OE_LICENSE to activate the oechem license.
One way to do this and keep it conda environment specific is:
On the command-line, first:
```
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
```
Then edit ```./etc/conda/activate.d/env_vars.sh``` as follows:
```
#!/bin/sh
export OE_LICENSE='/opt/scp/software/oelicense/1.0/oe_license.seq1'
```
and finally, edit ./etc/conda/deactivate.d/env_vars.sh :
```
#!/bin/sh
unset OE_LICENSE
```
## Unit Tests
After cloning the `DockStream` repository, enable licenses, if applicable (`OpenEye`, `CCDC`, `Schrodinger`). Then execute the following:
```
python unit_tests.py
```
## Contributors
Christian Margreitter (christian.margreitter@astrazeneca.com)
Jeff Guo (jeff.guo@astrazeneca.com)
Alexey Voronov (alexey.voronov1@astrazeneca.com)
# import the containers (effectively python dictionaries)
# make all Enums accessible
# rDock
# ---------
# OpenEye
# ---------
This diff is collapsed. Click to expand it.
import os
import json
import errno
import sys
import argparse
from dockstream.utils.execute_external.execute import Executor
from dockstream.utils import files_paths
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum
_DC = DockingConfigurationEnum()
def run_script(input_path: str) -> dict:
"""this method takes an input path to either a folder containing DockStream json files or a single json file and
returns a dictionary whose keys are the json names and the corresponding values are the paths to the json
file. The dictionary will be looped later to run DockStream
:param input_path: path to either a folder of json files or a single json file
:raises FileNotFoundError: this error is raised if input_path is neither a folder nor a file
:return: dictionary, keys are the DockStream json names and values are the paths to them
"""
# first check if input_path is valid (either a folder containing json files or a single json file)
if not os.path.isdir(input_path) and not os.path.isfile(input_path):
raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), input_path)
# if input_path is a folder, ensure it is not empty and that it contains at least 1 json file
if os.path.isdir(input_path):
if not os.listdir(input_path):
sys.exit(input_path + ' folder is empty. Please ensure your DockStream json files are added to the folder.')
elif not any(file.endswith('.json') for file in os.listdir(input_path)):
sys.exit(input_path + ' contains no json files. Please ensure your DockStream json files are added to the folder.')
# at this point, the path must be a file. Check that it is in json format
if os.path.isfile(input_path):
if not input_path.endswith('.json'):
sys.exit(input_path + ' is not a json file. Please ensure it is in json format.')
# initialize a dictionary to hold all DockStream runs
batch_runs = {}
# loop through all json files and update the paths if input_path if a directory
if os.path.isdir(input_path):
all_runs = [file for file in os.listdir(input_path) if file.endswith('.json')]
for json in all_runs:
batch_runs[json.replace('.json', '')] = os.path.join(input_path, json)
# at this point, input path must be a single json file
else:
json_name = os.path.basename(os.path.normpath(input_path)).replace('.json', '')
batch_runs[json_name] = input_path
return batch_runs
if __name__ == '__main__':
# take user specified input parameters to run the benchmarking script
parser = argparse.ArgumentParser(description='Facilitates batch DockStream execution.')
parser.add_argument('-input_path', type=str, required=True, help='The path to either a folder of DockStream json files or a single json file.')
args = parser.parse_args()
batch_runs = run_script(args.input_path)
executor = Executor()
# initialize a dictionary to store the names of all runs that did not enforce "best_per_ligand"
non_bpl_runs = {}
# loop through all user json files and run DockStream
for trial_name, json_path in batch_runs.items():
# check if the current DockStream run has "best_per_ligand" enforced
with open(json_path, "r") as f:
parameters = json.load(f)
# in case output mode was not specified in the configuration json
try:
for docking_run in parameters[_DC.DOCKING][_DC.DOCKING_RUNS]:
output_mode = docking_run[_DC.OUTPUT][_DC.OUTPUT_SCORES][_DC.OUTPUT_MODE]
if output_mode != _DC.OUTPUT_MODE_BESTPERLIGAND:
non_bpl_runs[trial_name] = output_mode
break
except:
pass
print(f'Running {trial_name}')
result = executor.execute(command=sys.executable, arguments=[files_paths.attach_root_path('docker.py'),
'-conf', json_path, '-debug'], check=False)
print(result)
# print out error messages (if applicable) for the current DockStream run
if result.returncode != 0:
print(f'There was an error with {trial_name} DockStream run.')
print(result.stdout)
print(result.stderr)
if bool(non_bpl_runs):
# print the names of the runs which did not enforce "best_per_ligand"
print(f"List of runs which did not have 'best_per_ligand' specified. These runs cannot be "
f"passed into the analysis script. {non_bpl_runs}")
This diff is collapsed. Click to expand it.
import os
import argparse
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import spearmanr
from scipy.stats import kendalltau
from matplotlib import pyplot as plt
import seaborn as sns
def to_scatter_plot(file_name: str, docking_scores: np.ndarray, experimental_scores: np.ndarray):
sns.set()
plt.scatter(docking_scores, experimental_scores, color='g')
plt.title(file_name + ' Scatter Plot')
plt.xlabel('Docking Score (-kcal/mol)')
plt.ylabel('Experimental Binding (-kcal/mol)')
plt.savefig(file_name.replace('csv', ''), dpi=300)
plt.figure()
def output_analysis(data_dir: str, exp_data_path: str, output_dir: str) -> dict:
docking_results = [file for file in os.listdir(data_dir) if file.endswith('.csv')]
analysis_results = {}
for file_name in docking_results:
data_path = os.path.join(data_dir, file_name)
# load the docking results
docking_data = pd.read_csv(data_path)
# load the experimental comparison data
experimental_data = pd.read_csv(exp_data_path)
# merge the docking results and experimental binding data based on their 'smiles' identifiers.
# this is to ensure that data pertaining to the correct 'smiles' ligands are compared irrespective
# of the order in which the user provides the experimental data
comparison_df = docking_data.merge(experimental_data, on='smiles')
# extract the DockStream docking scores and experimental potency parameter columns
docking_scores = comparison_df['score'].to_numpy().reshape(-1,1)
experimental_data = comparison_df['exp_binding'].to_numpy()
# fit a linear model using least squares
model = LinearRegression().fit(docking_scores, experimental_data)
coeff_determination = model.score(docking_scores, experimental_data)
# perform Spearman correlation analysis
spearman_correlation = spearmanr(docking_scores, experimental_data)
# perform Kendall correlation analysis
kendall_correlation = kendalltau(docking_scores, experimental_data)
# store the linear model and spearman quantities in a dictionary
analysis_results[file_name] = {'coefficient of determination': coeff_determination, 'Spearman correlation':
spearman_correlation, 'Kendall correlation': kendall_correlation}
# call to_scatter_plot to create a scatter plot of docking_scores against experimental_data
output_path = os.path.join(output_dir, file_name)
to_scatter_plot(output_path, docking_scores, experimental_data)
return analysis_results
if __name__ == '__main__':
# take user specified input parameters to run the analysis script
parser = argparse.ArgumentParser(description='Implements entry point to results analysis.')
parser.add_argument('-data_dir', type=str, required=True, help='The path to the output csv files to be analyzed.')
parser.add_argument('-exp_data_path', type=str, required=True, help='The path to the experimental binding data of your ligand library. This will be used for regression analysis.')
parser.add_argument('-output_dir', type=str, required=True, help='The desired output folder path to store the analyzed results and plots.')
#parser.add_argument('-regression_threshold', type=int, default=0.75, help='The desired regression threshold to classify good/poor backend performances based on your receptor-ligand system. The default value if not specified is 0.75.')
args = parser.parse_args()
analysis_results = output_analysis(args.data_dir, args.exp_data_path, args.output_dir)
{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"standard": {
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S"
},
"blank": {
"format": "%(message)s"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "DEBUG",
"formatter": "standard",
"stream": "ext://sys.stderr"
},
"file_handler": {
"class": "logging.handlers.RotatingFileHandler",
"level": "DEBUG",
"formatter": "standard",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
},
"file_handler_blank": {
"class": "logging.handlers.RotatingFileHandler",
"level": "DEBUG",
"formatter": "blank",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
}
},
"loggers": {
"command_line_interface": {
"level": "DEBUG",
"handlers": ["file_handler"],
"propagate": false
},
"target_preparation": {
"level": "DEBUG",
"handlers": ["file_handler"],
"propagate": false
},
"ligand_preparation": {
"level": "DEBUG",
"handlers": ["file_handler"],
"propagate": false
},
"docking": {
"level": "DEBUG",
"handlers": ["file_handler"],
"propagate": false
},
"blank": {
"level": "DEBUG",
"handlers": ["file_handler_blank"],
"propagate": false
}
},
"root": {
"level": "DEBUG",
"handlers": ["file_handler"]
}
}
{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"standard": {
"format": "%(asctime)s - %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S"
},
"blank": {
"format": "%(message)s"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "standard",
"stream": "ext://sys.stderr"
},
"file_handler": {
"class": "logging.handlers.RotatingFileHandler",
"level": "INFO",
"formatter": "standard",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
},
"file_handler_blank": {
"class": "logging.handlers.RotatingFileHandler",
"level": "INFO",
"formatter": "blank",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
}
},
"loggers": {
"command_line_interface": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"target_preparation": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"ligand_preparation": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"docking": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"blank": {
"level": "INFO",
"handlers": ["file_handler_blank"],
"propagate": false
}
},
"root": {
"level": "INFO",
"handlers": ["file_handler"]
}
}
{
"version": 1,
"disable_existing_loggers": false,
"formatters": {
"standard": {
"format": "%(asctime)s - %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S"
},
"blank": {
"format": "%(message)s"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "standard",
"stream": "ext://sys.stderr"
},
"file_handler": {
"class": "logging.handlers.RotatingFileHandler",
"level": "INFO",
"formatter": "standard",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
},
"file_handler_blank": {
"class": "logging.handlers.RotatingFileHandler",
"level": "INFO",
"formatter": "blank",
"filename": "dockstream.log",
"maxBytes": 10485760,
"backupCount": 20,
"encoding": "utf8"
}
},
"loggers": {
"command_line_interface": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"target_preparation": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"ligand_preparation": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"docking": {
"level": "INFO",
"handlers": ["file_handler"],
"propagate": false
},
"blank": {
"level": "INFO",
"handlers": ["file_handler_blank"],
"propagate": false
}
},
"root": {
"level": "INFO",
"handlers": ["file_handler"]
}
}
RBT_PARAMETER_FILE_V1.00
TITLE rDock_default_cavity_reference_ligand
RECEPTOR_FILE <RECEPTOR_MOL2_FILE_ABSOLUTE_PATH>
RECEPTOR_FLEX 3.0
##################################################################
### CAVITY DEFINITION: REFERENCE LIGAND METHOD
##################################################################
SECTION MAPPER
SITE_MAPPER RbtLigandSiteMapper
REF_MOL <REFERENCE_SDF_FILE_ABSOLUTE_PATH>
RADIUS 6.0
SMALL_SPHERE 1.0
MIN_VOLUME 100
MAX_CAVITIES 1
VOL_INCR 0.0
GRIDSTEP 0.5
END_SECTION
#################################
#CAVITY RESTRAINT PENALTY
#################################
SECTION CAVITY
SCORING_FUNCTION RbtCavityGridSF
WEIGHT 1.0
END_SECTION
{
"OE_LICENSE": "/opt/scp/software/oelicense/1.0/oe_license.seq1",
"CSDHOME": "/opt/scp/software/ccdc/2020.0.1/CSD_2020"
}
\ No newline at end of file
from dockstream.containers.container import ConfigurationContainer
class AnalysisContainer(ConfigurationContainer):
"""Class that takes a lot of arguments in the form of a JSON configuration (as dictionary, string or file path)
and, optionally, performs a JSON Schema validation."""
def __init__(self, conf, validation=True):
super().__init__(conf=conf)
# TODO: include validation with JSON Schema
if validation:
self.validate()
def validate(self):
pass
# load the Schema
#path = os.path.join(files_paths.move_up_directory(__file__, 1),
# "docking", "json_schemas",
# "BuildingConfiguration.json")
#schema = load_json.loadJSON(path=path)
# instantiate validator and perform the check
#validator = Draft7Validator(schema=schema)
#validator.validate(self._conf)
import abc
import json
import os
class ConfigurationContainer(object, metaclass=abc.ABCMeta):
"""Class that takes a lot of arguments in the form of a JSON configuration (as dictionary, string or file path)
and, optionally, performs a JSON Schema validation."""
@abc.abstractmethod
def __init__(self, conf):
# get instance of configuration enum and load configuration
# parameter "config" can be a string, a path or a dictionary (as long as it holds valid JSON input)
if isinstance(conf, str):
if os.path.isfile(conf):
with open(conf) as file:
conf = file.read().replace("\r", "").replace("\n", "")
conf = json.loads(conf)
self._conf = conf
def get_as_dict(self):
return self._conf
def get(self, key, default=None):
return self._conf.get(key, default)
def __getitem__(self, item):
return self.get_as_dict()[item]
def get_as_string(self):
return json.dumps(self._conf)
def validate(self):
raise NotImplementedError("This functions needs to be implemented by child classes.")
from dockstream.containers.container import ConfigurationContainer
class DockingContainer(ConfigurationContainer):
"""Class that takes a lot of arguments in the form of a JSON configuration (as dictionary, string or file path)
and, optionally, performs a JSON Schema validation."""
def __init__(self, conf, validation=True):
super().__init__(conf=conf)
# TODO: include validation with JSON Schema
if validation:
self.validate()
def validate(self):
pass
# load the Schema
#path = os.path.join(files_paths.move_up_directory(__file__, 1),
# "docking", "json_schemas",
# "BuildingConfiguration.json")
#schema = load_json.loadJSON(path=path)
# instantiate validator and perform the check
#validator = Draft7Validator(schema=schema)
#validator.validate(self._conf)
from dockstream.containers.container import ConfigurationContainer
class TargetPreparationContainer(ConfigurationContainer):
"""Class that takes a lot of arguments in the form of a JSON configuration (as dictionary, string or file path)
and, optionally, performs a JSON Schema validation."""
def __init__(self, conf, validation=True):
super().__init__(conf=conf)
# TODO: include validation with JSON Schema
if validation:
self.validate()
def validate(self):
pass
# load the Schema
#path = os.path.join(files_paths.move_up_directory(__file__, 1),
# "docking", "json_schemas",
# "BuildingConfiguration.json")
#schema = load_json.loadJSON(path=path)
# instantiate validator and perform the check
#validator = Draft7Validator(schema=schema)
#validator.validate(self._conf)
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.AutodockVina_enums import AutodockResultKeywordsEnum
class AutodockResultParser(ResultParser):
"""Class that loads, parses and analyzes the output of an "AutoDock Vina" docking run, including poses and scores."""
def __init__(self, ligands):
super().__init__(ligands=ligands)
self._RK = AutodockResultKeywordsEnum()
self._df_results = self._construct_dataframe()
def _construct_dataframe(self) -> pd.DataFrame:
def func_get_score(conformer):
return float(conformer.GetProp(self._RK.SDF_TAG_SCORE))
return super()._construct_dataframe_with_funcobject(func_get_score)
import os
import tempfile
from rdkit import Chem
from dockstream.utils.dockstream_exceptions import TargetPreparationFailed
from dockstream.core.target_preparator import TargetPreparator
from dockstream.utils.execute_external.OpenBabel import OpenBabelExecutor
from dockstream.utils.enums.AutodockVina_enums import AutodockTargetPreparationEnum
from dockstream.utils.enums.OpenBabel_enums import OpenBabelExecutablesEnum
from dockstream.containers.target_preparation_container import TargetPreparationContainer
from dockstream.utils.general_utils import *
class AutodockVinaTargetPreparator(TargetPreparator):
"""Class that deals with all the target preparatory steps needed before docking using "Autodock Vina" can commence.
Note: AutoDockTools recommends to calculate Kollmann (QM-derived but templated for amino acids) charges for the receptor and
Gasteiger charges for the ligands, but in contrast to Autodock 4, the Vina flavour ignores the charges on the receptor. The
only thing we need thus to ensure is that (only) the polar hydrogens are present. See FAQs in: http://autodock.scripps.edu/faqs-help/tutorial/using-autodock-with-autodocktools/UsingAutoDockWithADT_v2e.pdf"""
def __init__(self, conf: TargetPreparationContainer, target, run_number=0):
self._TP = AutodockTargetPreparationEnum()
self._EE = OpenBabelExecutablesEnum()
# invoke base class's constructor first
super().__init__(conf=conf, run_number=run_number)
# check, whether the backend run specified is an "rDock" one
if self._run_parameters[self._TP.RUNS_BACKEND] != self._TP.RUNS_BACKEND_AUTODOCKVINA:
raise TargetPreparationFailed("Tried to make an AutoDock Vina preparation with different backend specification.")
# treat the target: either load a file or store the molecule internally
if isinstance(target, str):
if os.path.isfile(target):
_, file_extension = os.path.splitext(target)
if file_extension == ".pdb":
self._target = Chem.MolFromPDBFile(target, sanitize=False)
elif file_extension == ".mol2":
self._target = Chem.MolFromMol2File(target, sanitize=False)
else:
raise TargetPreparationFailed("Input target file past must end on either \".pdb\" or \".mol2\".")
self._logger.log(f"Target preparation: File {target} loaded.", self._TL.DEBUG)
else:
raise TargetPreparationFailed("Input target file does not exist.")
elif isinstance(target, Chem.Mol):
self._target = target
else:
raise TargetPreparationFailed("Constructor only accepts a Mol (RDkit) object or a file path.")
self._logger.log("Stored target as RDkit molecule.", self._TL.DEBUG)
# initialize the executor for all "OpenBabel" related calls and also check if it is available
# note, that while there is an "OpenBabel" API (python binding) which we also use, the match to the binary
# options is not trivial; thus, use command-line here
self._OpenBabel_executor = OpenBabelExecutor()
if not self._OpenBabel_executor.is_available():
raise TargetPreparationFailed("Cannot initialize OpenBabel external library, which should be part of the environment - abort.")
self._logger.log(f"Checked OpenBabel binary availability.", self._TL.DEBUG)
def _export_as_pdb2pdbqt(self, path):
# generate temporary copy
_, temp_target_pdb = tempfile.mkstemp(suffix=".pdb")
Chem.MolToPDBFile(mol=self._target, filename=temp_target_pdb)
# set target pH value that determines the protein's side-chain states
if in_keys(self._run_parameters, [self._TP.RUNS_PARAM, self._TP.PH]):
pH = float(self._run_parameters[self._TP.RUNS_PARAM][self._TP.PH])
else:
pH = 7.4
self._logger.log(f"As a specific pH was not specified, the default pH of {pH} will be used.", self._TL.INFO)
# Note: In contrast to the ligand preparation, we will not use a tree-based flexibility treatment here - thus,
# the option "-xr" is used. Partial charges of the receptor are not used in AutoDock Vina.
arguments = [temp_target_pdb,
self._EE.OBABEL_OUTPUT_FORMAT_PDBQT,
"".join([self._EE.OBABEL_O, path]),
"".join([self._EE.OBABEL_X, self._EE.OBABEL_X_R]),
self._EE.OBABEL_P, pH,
self._EE.OBABEL_PARTIALCHARGE, self._EE.OBABEL_PARTIALCHARGE_GASTEIGER]
self._OpenBabel_executor.execute(command=self._EE.OBABEL,
arguments=arguments,
check=False)
# clean up the temporary file
if os.path.exists(temp_target_pdb):
os.remove(temp_target_pdb)
self._logger.log(f"Exported target as PDBQT file {path}.", self._TL.DEBUG)
def _log_extract_box(self):
x_coords, y_coords, z_coords = self._extract_box()
if x_coords is not None:
def dig(value):
return round(value, ndigits=2)
self._logger.log(f"Ligand from file {self._run_parameters[self._TP.RUNS_PARAM][self._TP.EXTRACT_BOX][self._TP.EXTRACT_BOX_REFERENCE_LIGAND_PATH]} has the following dimensions:",
self._TL.INFO)
self._logger_blank.log(f"X coordinates: min={dig(min(x_coords))}, max={dig(max(x_coords))}, mean={dig(sum(x_coords)/len(x_coords))}",
self._TL.INFO)
self._logger_blank.log(f"Y coordinates: min={dig(min(y_coords))}, max={dig(max(y_coords))}, mean={dig(sum(y_coords)/len(y_coords))}",
self._TL.INFO)
self._logger_blank.log(f"Z coordinates: min={dig(min(z_coords))}, max={dig(max(z_coords))}, mean={dig(sum(z_coords)/len(z_coords))}",
self._TL.INFO)
def _extract_box(self):
# extracts box suggestions from a reference ligand, which can be added to a AutoDock Vina run
if in_keys(self._run_parameters, [self._TP.RUNS_PARAM, self._TP.EXTRACT_BOX]):
if in_keys(self._run_parameters, [self._TP.RUNS_PARAM, self._TP.EXTRACT_BOX, self._TP.EXTRACT_BOX_REFERENCE_LIGAND_PATH]) and \
in_keys(self._run_parameters, [self._TP.RUNS_PARAM, self._TP.EXTRACT_BOX, self._TP.EXTRACT_BOX_REFERENCE_LIGAND_FORMAT]):
# load the reference file (PDB or SDF)
ref_format = self._run_parameters[self._TP.RUNS_PARAM][self._TP.EXTRACT_BOX][self._TP.EXTRACT_BOX_REFERENCE_LIGAND_FORMAT].upper()
if ref_format == self._TP.EXTRACT_BOX_REFERENCE_LIGAND_FORMAT_PDB:
ref_mol = Chem.MolFromPDBFile(self._run_parameters[self._TP.RUNS_PARAM][self._TP.EXTRACT_BOX][self._TP.EXTRACT_BOX_REFERENCE_LIGAND_PATH],
sanitize=True)
elif ref_format == self._TP.EXTRACT_BOX_REFERENCE_LIGAND_FORMAT_SDF:
mol_supplier = Chem.SDMolSupplier(self._run_parameters[self._TP.RUNS_PARAM][self._TP.EXTRACT_BOX][self._TP.EXTRACT_BOX_REFERENCE_LIGAND_PATH])
for mol in mol_supplier:
ref_mol = mol
else:
raise TargetPreparationFailed("Specified format not supported.")
# extract coordinates
x_coords = [atom[0] for atom in ref_mol.GetConformer(0).GetPositions()]
y_coords = [atom[1] for atom in ref_mol.GetConformer(0).GetPositions()]
z_coords = [atom[2] for atom in ref_mol.GetConformer(0).GetPositions()]
return x_coords, y_coords, z_coords
else:
self._logger.log(f"In order extract the box, both {self._TP.EXTRACT_BOX_REFERENCE_LIGAND_PATH} and {self._TP.EXTRACT_BOX_REFERENCE_LIGAND_FORMAT} must be defined.")
return None, None, None
def specify_cavity(self):
# write out the input PDB as PDBQT file
self._export_as_pdb2pdbqt(self._run_parameters[self._TP.RUNS_OUTPUT][self._TP.RECEPTOR_PATH])
# if there is a reference ligand provided, calculate mean, minimum and maximum coordinates and log out
self._log_extract_box()
def write_target(self, path):
# TODO: move writing functionality here (and for rDock) to this method, respectively
pass
import os
import shutil
from copy import deepcopy
from typing import Tuple, Optional, List
from typing_extensions import Literal
import tempfile
from pydantic import BaseModel, PrivateAttr
from rdkit import Chem
from dockstream.core.RDkit.RDkit_ligand_preparator import RDkitLigandPreparator
from dockstream.core.ligand.ligand import get_next_enumeration_number_for_ligand, Ligand, reset_enumerations_for_ligands
from dockstream.core.ligand_preparator import LigandPreparator, _LE
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.utils.smiles import to_mol
from dockstream.utils.execute_external.Corina import CorinaExecutor
from dockstream.utils.enums.Corina_enums import CorinaLigandPreparationEnum, CorinaExecutablesEnum
_LP = CorinaLigandPreparationEnum()
_EE = CorinaExecutablesEnum()
class Parallelization(BaseModel):
number_cores: int = 1
max_compounds_per_subjob: Optional[int] = None
class CorinaLigandPreparatorParameters(BaseModel):
prefix_execution: Optional[str] = None
binary_location: Optional[str] = None
parallelization: Optional[Parallelization] = Parallelization()
enumerate_stereo: Optional[bool] = False
d_options: Optional[List[str]] = ["wh", "stergen", "preserve",
"noflapn", "ori", "ampax",
"names", "rc", "mc=1"]
class CorinaLigandPreparator(LigandPreparator, BaseModel):
"""Class that acts as an interface to the "Corina" executable to prepare ligands."""
type: Literal["Corina"] = "Corina"
parameters: CorinaLigandPreparatorParameters = CorinaLigandPreparatorParameters()
class Config:
underscore_attrs_are_private = True
_Corina_executor: CorinaExecutor = PrivateAttr()
def __init__(self, **data):
super().__init__(**data)
self._Corina_executor = CorinaExecutor(prefix_execution=self.parameters.prefix_execution,
binary_location=self.parameters.binary_location)
if not self._Corina_executor.is_available():
raise LigandPreparationFailed("Cannot initialize Corina backend - abort.")
self._logger.log(f"Checked Corina backend availability (prefix_execution={self.parameters.prefix_execution}, binary_location={self.parameters.binary_location}).",
_LE.DEBUG)
def _load_references(self):
references = []
ref_format = self.align.reference_format.upper()
for path in self.align.reference_paths:
if ref_format == _LP.ALIGN_REFERENCE_FORMAT_PDB:
ref_mol = Chem.MolFromPDBFile(path, sanitize=True)
ref_mol.SetProp("_Name", os.path.basename(path))
references.append(ref_mol)
elif ref_format == _LP.ALIGN_REFERENCE_FORMAT_SDF:
mol_supplier = Chem.SDMolSupplier(path)
for mol in mol_supplier:
references.append(mol)
else:
raise IOError("Specified format not supported.")
if len(references) == 0:
raise LigandPreparationFailed("No reference molecules could be loaded with path(s) specified.")
self._references = references
self._logger.log(f"Stored {len(references)} reference molecules.", _LE.DEBUG)
def _get_RDkit_aligner(self, conf, ligands):
return RDkitLigandPreparator(ligands=ligands, **conf)
def _get_d_parameters(self) -> str:
if isinstance(self.parameters.d_options, str):
self.parameters.d_options = [self.parameters.d_options]
if not isinstance(self.parameters.d_options, list) or len(self.parameters.d_options) == 0:
err_msg = f"If specified, parameter {_LP.D_OPTIONS} must be a list of strings."
self._logger.log(err_msg, _LE.ERROR)
raise ValueError(err_msg)
d_parameters = ','.join(self.parameters.d_options)
return d_parameters
def _parse_molecules(self, tmp_sdf_path: str) -> List[Ligand]:
mol_supplier = Chem.SDMolSupplier(tmp_sdf_path, removeHs=False)
expanded_ligands = []
for mol in mol_supplier:
# Corina has a strange way of naming the conformers (e.g. "0:0_i001_c001" and "0:0_i002_c001" are
# conformers of the same ligand ordered by internal strain energy; also, the option "mc=1" does not
# reduce really to if multiple stereo-isomers are given
if mol is not None and mol.HasProp("_Name"):
name_parts = mol.GetProp("_Name").split('_')
# check, that only one conformation per enumeration is taken forward
if name_parts[2] != "c001":
continue
for lig in self.ligands:
if name_parts[0] == lig.get_identifier():
# check, if it is the first (energy minimized) one and add it in case
# stereo-enumeration is disabled
if self.parameters.enumerate_stereo or name_parts[1] == "i001":
expanded_ligands.append(Ligand(smile=Chem.MolToSmiles(mol, isomericSmiles=True),
original_smile=lig.get_original_smile(),
ligand_number=lig.get_ligand_number(),
enumeration=lig.get_enumeration(),
molecule=mol,
mol_type=_LP.TYPE_CORINA,
name=lig.get_name()))
else:
self._logger.log(
"Skipped molecule when loading as _Name property could not be found - typically, this indicates that Corina could not embed the molecule.",
_LE.DEBUG)
return expanded_ligands
def _smiles_to_molecules(self, ligands: List[Ligand]) -> List[Ligand]:
for lig in ligands:
mol = to_mol(lig.get_smile())
lig.set_molecule(mol)
lig.set_mol_type(_LP.TYPE_CORINA)
return ligands
def generate3Dcoordinates(self):
for lig in self.ligands:
lig.set_molecule(None)
lig.set_mol_type(None)
ligand_list = self._smiles_to_molecules(deepcopy(self.ligands))
# 1) generate temporary folder and files
tmp_output_dir_path = tempfile.mkdtemp()
fd, tmp_smiles_path = tempfile.mkstemp(suffix=".smi", dir=tmp_output_dir_path)
fd, tmp_molecules_path = tempfile.mkstemp(suffix=".sdf", dir=tmp_output_dir_path)
# 2) save the SMILES
with open(tmp_smiles_path, 'w') as f:
for lig in ligand_list:
f.write(lig.get_smile() + " " + lig.get_identifier() + "\n")
# 3) get "-d" parameters (either default or user specified)
d_parameters = self._get_d_parameters()
# 4) run "Corina" backend
result = self._Corina_executor.execute(command=_EE.CORINA,
arguments=[_EE.CORINA_D, d_parameters,
_EE.CORINA_T, _EE.CORINA_T_DISABLED,
_EE.CORINA_I, _EE.CORINA_T_SMILES, tmp_smiles_path,
_EE.CORINA_O, _EE.CORINA_T_SDF, tmp_molecules_path],
check=False)
self._logger.log(f"Executed Corina backend (output file: {tmp_molecules_path}).", _LE.DEBUG)
# 5) load and store the conformers; name it sequentially
# note, that some backends require all H-coordinates (such as Glide) - so keep them!
expanded_ligands = self._parse_molecules(tmp_molecules_path)
# 6) merge newly embedded ligands with the old list
merged_list = []
for lig_old in self.ligands:
# make a list with all the new enumerations for a given "old" ligand
lig_enums_list = [lig_enum for lig_enum in expanded_ligands if lig_enum.get_identifier() == lig_old.get_identifier()]
if len(lig_enums_list) == 0:
# embedding failed completely, keep the old ligand (with "molecule" set to "None")
merged_list.append(lig_old)
else:
# embedding succeeded, so replace the original ligand with the one (or more) embedded enumerations
for lig_emb in lig_enums_list:
merged_list.append(lig_emb)
reset_enumerations_for_ligands(merged_list)
self.ligands = merged_list
not_embedded = len([True for lig in self.ligands if lig.get_molecule() is None])
if not_embedded > 0:
self._logger.log(f"Corina might have had issues embedding all {len(self.ligands)} ligands, {not_embedded} were not obtained.",
_LE.WARNING)
for lig in self.ligands:
if lig.get_molecule() is None:
self._logger.log(f"It appears, Corina could not embed ligand {lig.get_identifier()} (smile: {lig.get_smile()}).",
_LE.DEBUG)
# 7) remove temporary files
shutil.rmtree(tmp_output_dir_path)
self._logger.log(f"In total, {len([True for lig in self.ligands if lig.get_molecule() is not None])} ligands (including enumerations) embedded (Corina backend).", _LE.DEBUG)
def align_ligands(self):
self.ligands = self._align_ligands_with_RDkit_preparator(self.ligands)
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.Gold_enums import GoldOutputEnum, GoldDockingConfigurationEnum
class GoldResultParser(ResultParser):
"""Class that loads, parses and analyzes the output of a "Gold" docking run, including poses and scores."""
def __init__(self, ligands: list, fitness_function: str, response_value="fitness"):
super().__init__(ligands=ligands)
self._ROE = GoldOutputEnum()
self._DE = GoldDockingConfigurationEnum()
self._fitness_function = fitness_function
self._response_value = response_value
self._df_results = self._construct_dataframe()
def _get_scoring_function_parameters(self):
# get the appropriate name of the tag and whether mininmal or maximal values are best for
# the specified scoring function
if self._response_value == self._DE.GOLD_RESPONSE_VALUE_FITNESS:
scoring_function_parameters = self._ROE.DICT_FITNESS[self._fitness_function]
elif self._response_value == self._DE.GOLD_RESPONSE_VALUE_VALUE:
scoring_function_parameters = self._ROE.DICT_VALUE[self._fitness_function]
else:
raise ValueError("Parameter response value must be either fitness or value.")
self._logger.log(f"Set scoring_function_parameters to {scoring_function_parameters} for result parsing.",
self._LE.DEBUG)
return scoring_function_parameters
def _construct_dataframe(self) -> pd.DataFrame:
scoring_function_parameters = self._get_scoring_function_parameters()
def func_get_score(conformer):
return float(conformer.GetProp(scoring_function_parameters[self._ROE.TAG]))
return super()._construct_dataframe_with_funcobject(func_get_score)
import os
import pickle
import ccdc
from ccdc.docking import Docker
from ccdc.io import MoleculeReader
from dockstream.core.target_preparator import TargetPreparator
from dockstream.utils.dockstream_exceptions import TargetPreparationFailed
from dockstream.utils.enums.Gold_enums import GoldTargetPreparationEnum, GoldTargetKeywordEnum
from dockstream.containers.target_preparation_container import TargetPreparationContainer
class GoldTargetPreparator(TargetPreparator):
"""Class that deals with all the target preparatory steps needed before docking using "GOLD" can commence."""
def __init__(self, conf: TargetPreparationContainer, target, run_number=0):
self._TP = GoldTargetPreparationEnum()
self._TK = GoldTargetKeywordEnum()
self._target_dict = {self._TK.VERSION: self._TK.CURRENT_VERSION}
# invoke base class's constructor first
super().__init__(conf=conf, run_number=run_number)
# check, whether the backend run specified is an "GOLD" one
if self._run_parameters[self._TP.RUNS_BACKEND] != self._TP.RUNS_BACKEND_GOLD:
raise TargetPreparationFailed("Tried to make an GOLD preparation with different backend specification.")
if isinstance(target, str):
if os.path.isfile(target):
_, file_extension = os.path.splitext(target)
if file_extension == ".pdb":
self._target = Docker()
self._settings = self._target.settings
self._settings.add_protein_file(file_name=target)
# add information to dictionary
with open(target, 'r') as file:
self._target_dict[self._TK.TARGET_PDB] = [line for line in file]
self._target_dict[self._TK.TARGET_PDB_FILENAME] = os.path.basename(target)
else:
raise TargetPreparationFailed("Specified input file must be in PDB format for GOLD.")
else:
raise TargetPreparationFailed("Input target file does not exist.")
elif isinstance(target, ccdc.docking.Docker):
raise NotImplementedError
else:
raise TargetPreparationFailed("Constructor only accepts and Protein.BindingSite object or a file path.")
self._logger.log("Added target to GOLD settings.", self._TL.DEBUG)
def specify_cavity(self):
self._target_dict[self._TK.CAVITY_METHOD] = self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD]
if self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD] == self._TP.CAVITY_METHOD_REFERENCE:
# note, that "MoleculeReader" is able to discern many formats from the ending, including "mol2" and "pdb"
ref_ligand_path = self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_REFERENCE_PATH]
ref_ligand = MoleculeReader(filename=ref_ligand_path)
protein = self._settings.proteins[0]
self._settings.binding_site = self._settings.BindingSiteFromLigand(protein, ref_ligand, distance=self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_REFERENCE_DISTANCE])
# add information to dictionary
with open(ref_ligand_path, 'r') as file:
self._target_dict[self._TK.REFERENCE_LIGAND] = [line for line in file]
self._target_dict[self._TK.CAVITY_REFERENCE_DISTANCE] = self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_REFERENCE_DISTANCE]
self._target_dict[self._TK.REFERENCE_LIGAND_FILENAME] = os.path.basename(ref_ligand_path)
elif self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD] == self._TP.CAVITY_METHOD_POINT:
raise NotImplementedError
# origin (x,x,x)
# distance x
else:
raise TargetPreparationFailed("Specified cavity determination method not defined for GOLD.")
self._logger.log(f"Generated GOLD Protein.BindingSite with method {self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD]}.", self._TL.DEBUG)
def write_target(self, path):
_, file_extension = os.path.splitext(path)
if file_extension != ".pkl":
raise TargetPreparationFailed("Receptor files must end on .pkl.")
if self._TK.CAVITY_METHOD not in self._target_dict:
self._logger.log("Need to have executed specify_cavity before writing out result - will attempt this now.", self._TL.WARNING)
self.specify_cavity()
with open(path, "wb") as f:
pickle.dump(self._target_dict, f)
self._logger.log(f"Wrote binding site to file {path}.", self._TL.DEBUG)
from typing import List
from pydantic import BaseModel
from rdkit import Chem
import openeye.oechem as oechem
import openeye.oeomega as oeomega
from copy import deepcopy
from typing_extensions import Literal
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.core.ligand_preparator import LigandPreparator, _LE
from dockstream.core.RDkit.RDkit_ligand_preparator import RDkitLigandPreparator
from dockstream.utils.translations.molecule_translator import MoleculeTranslator
from dockstream.utils.translations.translation import RDkitMolToOpenEyeMol
from dockstream.utils.enums.OpenEye_enums import OpenEyeLigandPreparationEnum
from dockstream.core.ligand.ligand import Ligand
_LP = OpenEyeLigandPreparationEnum()
class OpenEyeLigandPreparator(LigandPreparator, BaseModel):
type: Literal["OpenEye"] = "OpenEye"
class Config:
underscore_attrs_are_private = True
def __init__(self, **data):
super().__init__(**data)
def _initialize_ligands(self):
super()._initialize_ligands()
def _load_references(self):
references = []
for path in self.align.reference_paths:
mol_supplier = oechem.oemolistream()
# set the provided format
ref_format = self.align.reference_format.upper()
if ref_format == _LP.ALIGN_REFERENCE_FORMAT_SDF:
mol_supplier.SetFormat(oechem.OEFormat_SDF)
elif ref_format == _LP.ALIGN_REFERENCE_FORMAT_PDB:
mol_supplier.SetFormat(oechem.OEFormat_PDB)
else:
raise LigandPreparationFailed("Specified format not supported!")
if mol_supplier.open(path):
for mol in mol_supplier.GetOEMols():
references.append(oechem.OEMol(mol))
else:
oechem.OEThrow.Fatal("Unable to create specified output file.")
if len(references) == 0:
raise LigandPreparationFailed("No reference molecules could be loaded at path(s) specified.")
self._references = references
self._logger.log(f"Stored {len(references)} reference molecules.", _LE.DEBUG)
def _get_RDkit_aligner(self, conf, ligands):
return RDkitLigandPreparator(ligands=ligands, **conf)
def _smiles_to_molecules(self, ligands: List[Ligand]) -> List[Ligand]:
for lig in ligands:
lig_molecule = oechem.OEMol()
oechem.OESmilesToMol(lig_molecule, lig.get_smile())
lig.set_molecule(lig_molecule)
lig.set_mol_type(_LP.TYPE_OPENEYE)
return ligands
def generate3Dcoordinates(self):
"""Method to generate 3D coordinates, in case the molecules have been built from SMILES."""
for lig in self.ligands:
lig.set_molecule(None)
lig.set_mol_type(None)
ligand_list = self._smiles_to_molecules(deepcopy(self.ligands))
failed = 0
succeeded = 0
builder = oeomega.OEConformerBuilder()
for idx, ligand in enumerate(ligand_list):
inp_mol = ligand.get_molecule()
if inp_mol is None:
continue
return_code = builder.Build(inp_mol)
if return_code != oeomega.OEOmegaReturnCode_Success:
failed += 1
self._logger.log(f"The 3D coordinate generation of molecule {ligand.get_ligand_number()} (smile: {ligand.get_smile()}) failed (oeomega return code={return_code}).",
_LE.DEBUG)
continue
self.ligands[idx] = Ligand(smile=ligand.get_smile(),
original_smile=ligand.get_original_smile(),
ligand_number=ligand.get_ligand_number(),
enumeration=ligand.get_enumeration(),
molecule=oechem.OEMol(inp_mol),
mol_type=_LP.TYPE_OPENEYE,
name=ligand.get_name())
succeeded += 1
if failed > 0:
self._logger.log(f"Of {len(self.ligands)}, {failed} could not be embedded.", _LE.WARNING)
self._logger.log(f"In total, {succeeded} ligands were successfully embedded (oeomega).", _LE.DEBUG)
def align_ligands(self):
if self.align.mode != _LP.ALIGN_MODE_INTERNAL:
raise LigandPreparationFailed("Only internal alignment supported at the moment.")
if self._references is None:
raise LigandPreparationFailed("No reference molecule has been found.")
# use the general, internal alignment technique
# ---------
# 1) translate the ligands from openeye to rdkit and do not use "bySMILES" method, as
# coordinates would be lost
mol_trans = MoleculeTranslator(self.ligands)
ligands_rdkit = mol_trans.get_as_rdkit()
self._logger.log(f"Align: Of {len(self.ligands)}, {len(ligands_rdkit)} were translated to RDkit molecules.",
_LE.DEBUG)
# 2) do the alignment to a reference molecule; also disable RDkit logger
ligands_rdkit = self._align_ligands_with_RDkit_preparator(ligands_rdkit)
# 3) translate ligands back and update internal collection
mol_trans = MoleculeTranslator(ligands_rdkit)
translated_mols = mol_trans.get_as_openeye()
for lig, translated_mol in zip(self.ligands, translated_mols):
lig.set_molecule(translated_mol.get_molecule())
def write_ligands(self, path, format):
ofs = oechem.oemolostream()
format = format.upper()
ligands_copy = [deepcopy(lig) for lig in self.ligands]
# check and specify format of file
if format == _LP.OUTPUT_FORMAT_SDF:
ofs.SetFormat(oechem.OEFormat_SDF)
elif format == _LP.OUTPUT_FORMAT_MOL2:
ofs.SetFormat(oechem.OEFormat_MOL2)
else:
raise LigandPreparationFailed("Specified output format unknown.")
if ofs.open(path):
for lig in ligands_copy:
lig.add_tags_to_molecule()
if lig.get_molecule() is not None:
mol = deepcopy(lig.get_molecule())
mol.SetTitle(lig.get_identifier())
oechem.OEWriteMolecule(ofs, mol)
else:
oechem.OEThrow.Fatal("Unable to create specified output file.")
ofs.close()
self._logger.log(f"Wrote {len(self.ligands)} molecules to file {path} (format: {format}).", _LE.DEBUG)
def _make_ligands_from_molecules(self, ligands):
buffer = []
if isinstance(ligands[0], Chem.Mol):
ligands = [RDkitMolToOpenEyeMol(mol, bySMILES=False) for mol in ligands]
for index_mol, mol in enumerate(ligands):
buffer.append(Ligand(smile=oechem.OEMolToSmiles(mol),
ligand_number=index_mol,
enumeration=0,
molecule=mol,
mol_type=_LP.TYPE_OPENEYE))
self.ligands = buffer
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.OpenEye_enums import OpenEyeResultKeywordsEnum
class OpenEyeResultParser(ResultParser):
"""Class that loads, parses and analyzes the output of an "OpenEye" docking run, including poses and scores."""
def __init__(self, ligands: list):
super().__init__(ligands=ligands)
self._RK = OpenEyeResultKeywordsEnum()
self._df_results = self._construct_dataframe()
def _construct_dataframe(self) -> pd.DataFrame:
def func_get_score(conformer):
return float(conformer.GetEnergy())
return super()._construct_dataframe_with_funcobject(func_get_score)
import os
from dockstream.core.target_preparator import TargetPreparator
import openeye.oechem as oechem
import openeye.oedocking as oedocking
from dockstream.utils.dockstream_exceptions import TargetPreparationFailed
from dockstream.utils.enums.OpenEye_enums import OpenEyeTargetPreparationEnum
from dockstream.containers.target_preparation_container import TargetPreparationContainer
class OpenEyeTargetPreparator(TargetPreparator):
"""Class that deals with all the target preparatory steps needed before docking using "OpenEye" can commence."""
def __init__(self, conf: TargetPreparationContainer, target, run_number=0):
self._TP = OpenEyeTargetPreparationEnum()
# invoke base class's constructor first
super().__init__(conf=conf, run_number=run_number)
# check, whether the backend run specified is an "OpenEye" one
if self._run_parameters[self._TP.RUNS_BACKEND] != self._TP.RUNS_BACKEND_OPENEYE:
raise TargetPreparationFailed("Tried to make an OpenEye preparation with different backend specification.")
# treat the target: either load a file or store the molecule internally
if isinstance(target, str):
if os.path.isfile(target):
_, file_extension = os.path.splitext(target)
if file_extension == ".pdb":
istream = oechem.oemolistream(target)
protein = oechem.OEGraphMol()
oechem.OEReadMolecule(istream, protein)
self._protein = protein
else:
raise TargetPreparationFailed("Specified input file must be in PDB format for OpenEye.")
else:
raise TargetPreparationFailed("Input target file does not exist.")
elif isinstance(target, oechem.OEGraphMol):
self._target = target
else:
raise TargetPreparationFailed("Constructor only accepts an OEGraphMol (OpenEye) object or a file path.")
self._logger.log("Stored target as OpenEye molecule.", self._TL.DEBUG)
def specify_cavity(self):
target = oechem.OEGraphMol()
if self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD] == self._TP.CAVITY_METHOD_BOX:
# specify 6 floating-point numbers which define a "box" around the cavity
limits = [float(x) for x in self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_BOX_LIMITS]]
if len(limits) != 6:
raise TargetPreparationFailed("The limits for the box specification must be an array of 6 values.")
box = oedocking.OEBox(*limits)
oedocking.OEMakeReceptor(target, self._protein, box)
elif self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD] == self._TP.CAVITY_METHOD_REFERENCE:
# use a reference molecule to specify the cavity
ref_istream = oechem.oemolistream(self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_REFERENCE_PATH])
# set the provided format
ref_format = self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_REFERENCE_FORMAT].upper()
if ref_format == self._TP.CAVITY_REFERENCE_FORMAT_SDF:
ref_istream.SetFormat(oechem.OEFormat_SDF)
elif ref_format == self._TP.CAVITY_REFERENCE_FORMAT_PDB:
ref_istream.SetFormat(oechem.OEFormat_PDB)
else:
raise TargetPreparationFailed("Specified format not supported!")
ref = oechem.OEGraphMol()
oechem.OEReadMolecule(ref_istream, ref)
oedocking.OEMakeReceptor(target, self._protein, ref)
elif self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD] == self._TP.CAVITY_METHOD_HINT:
# specify a point (a "hint") that is in or close at the cavity; three coordinates required
coordinates = [float(x) for x in self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_HINT_COORDINATES]]
if len(coordinates) != 3:
raise TargetPreparationFailed("Method hint requires 3 values.")
oedocking.OEMakeReceptor(target, self._protein, *coordinates)
else:
raise TargetPreparationFailed("Specified cavity determination method not defined for OpenEye.")
self._target = target
self._logger.log(f"Generated OpenEye receptor with method {self._run_parameters[self._TP.CAVITY][self._TP.CAVITY_METHOD]}.", self._TL.DEBUG)
def write_target(self, path):
_, file_extension = os.path.splitext(path)
if file_extension != ".oeb" and file_extension != ".oeb.gz":
raise TargetPreparationFailed("Receptor files must end on either .oeb or .oeb.gz.")
oedocking.OEWriteReceptorFile(self._target, path)
self._logger.log(f"Wrote receptor to file {path}.", self._TL.DEBUG)
import openeye.oechem as oechem
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed, TransformationFailed
from dockstream.utils.enums.OpenEye_enums import OpenEyeDockingConfigurationEnum, OpenEyeLigandPreparationEnum
from dockstream.core.transformator import Transformator
class OpenEyeTransformator(Transformator):
"""Class that applies SMIRKS (OpenEye style) to compounds before they are further processed."""
def __init__(self, conf):
super().__init__(conf)
self._LP = OpenEyeLigandPreparationEnum()
self._CE = OpenEyeDockingConfigurationEnum()
def transform(self, ligands) -> list:
number_input = len(ligands)
failed_indices = []
# code based on Graeme Robb's scrip
if self._type == self._TE.TRANSFORMATION_TYPE_SMIRKS:
rxn = oechem.OEUniMolecularRxn(self._smirk)
for ligand_number, ligand in enumerate(ligands):
try:
molecule = oechem.OEMol()
oechem.OESmilesToMol(molecule, ligand.get_smile())
# apply the smirk
oechem.OEAddExplicitHydrogens(molecule)
success = rxn(molecule)
if success:
ligand.set_smile(oechem.OEMolToSmiles(molecule))
else:
raise TransformationFailed
except Exception as e:
failed_indices.append(ligand_number)
if self._fail_action == self._TE.TRANSFORMATION_FAIL_ACTION_DISCARD:
failed_indices.reverse()
for index in failed_indices:
self._logger.log(f"Failed transformation, discarding ligand with smile {ligands[index].get_smile()}.", self._LE.DEBUG)
del ligands[index]
else:
self._logger.log(f"For transformation backend {self._backend}, only type {self._TE.TRANSFORMATION_TYPE_SMIRKS} is supported.", self._LE.ERROR)
raise LigandPreparationFailed(f"For transformation backend {self._backend}, only type {self._TE.TRANSFORMATION_TYPE_SMIRKS} is supported.")
self._logger.log(f"Of {number_input} input smiles, {len(ligands)} smiles were transformed / retained ({len(failed_indices)} transformations failed with \"fail_action\" set to {self._fail_action}).", self._LE.DEBUG)
for ligand in ligands:
self._logger_blank.log(ligand.get_smile(), self._LE.DEBUG)
return ligands
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.OE_Hybrid_enums import OpenEyeHybridOutputKeywordsEnum
class OpenEyeHybridResultParser(ResultParser):
"""Loads, parses and analyzes the output of an "OpenEye Hybrid" docking run, including poses and score."""
def __init__(self, ligands: list):
super().__init__(ligands=ligands)
self._OE = OpenEyeHybridOutputKeywordsEnum()
self._df_results = self._construct_dataframe()
def _construct_dataframe(self) -> pd.DataFrame:
def func_get_score(conformer):
return float(conformer.GetProp(self._OE.SCORE))
return super()._construct_dataframe_with_funcobject(func_get_score)
from copy import deepcopy
from typing import Optional, List
from typing_extensions import Literal
from pydantic import BaseModel
from rdkit import Chem
from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers, StereoEnumerationOptions
from dockstream.core.stereo_enumerator import StereoEnumerator
from dockstream.core.ligand.ligand import Ligand, get_next_enumeration_number_for_ligand
class RDKitStereoEnumeratorParameters(BaseModel):
try_embedding: bool = True
unique: bool = True
max_isomers: int = 1024
rand: Optional[int] = 0xf00d
class RDKitStereoEnumerator(StereoEnumerator, BaseModel):
backend: Literal["RDKit"] = "RDKit"
parameters: RDKitStereoEnumeratorParameters = RDKitStereoEnumeratorParameters()
def __init__(self, **data):
super().__init__(**data)
def enumerate(self, ligands: List[Ligand]) -> List[Ligand]:
new_ligands_list = []
opts = StereoEnumerationOptions(tryEmbedding=self.parameters.try_embedding,
unique=self.parameters.unique,
maxIsomers=self.parameters.max_isomers,
rand=self.parameters.rand)
for ligand in ligands:
molecule = Chem.MolFromSmiles(ligand.get_smile())
if not molecule:
# could not build molecule, keep the original ligand
new_ligands_list.append(deepcopy(ligand))
continue
isomers = tuple(EnumerateStereoisomers(molecule, options=opts))
if len(isomers) == 0:
# could not enumerate, keep original ligand
new_ligands_list.append(deepcopy(ligand))
continue
# loop over stereo-isomers, translate them into smiles and create new ligand objects from them
for new_smile_id, new_smile in enumerate(sorted(Chem.MolToSmiles(x, isomericSmiles=True) for x in isomers)):
new_ligands_list.append(Ligand(smile=new_smile,
original_smile=ligand.get_original_smile(),
ligand_number=ligand.get_ligand_number(),
enumeration=get_next_enumeration_number_for_ligand(new_ligands_list,
ligand.get_ligand_number()),
molecule=None,
mol_type=None,
name=ligand.get_name()))
return new_ligands_list
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.Schrodinger_enums import SchrodingerOutputEnum
class GlideResultParser(ResultParser):
"""Class that loads, parses and analyzes the output of a "Glide" docking run, including poses and scores."""
def __init__(self, ligands: list):
super().__init__(ligands=ligands)
self._df_results = self._construct_dataframe()
def _construct_dataframe(self) -> pd.DataFrame:
def func_get_score(conformer):
_ROE = SchrodingerOutputEnum()
return float(conformer.GetProp(_ROE.GLIDE_DOCKING_SCORE))
return super()._construct_dataframe_with_funcobject(func_get_score)
import time
from pydantic import BaseModel, PrivateAttr
from typing import Optional, Dict
from dockstream.loggers.docking_logger import DockingLogger
from dockstream.utils.execute_external.execute import Executor
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.Schrodinger_enums import SchrodingerExecutablesEnum, \
SchrodingerDockingConfigurationEnum, \
SchrodingerOutputEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
_LP = LigandPreparationEnum()
_CE = SchrodingerDockingConfigurationEnum()
_EE = SchrodingerExecutablesEnum()
_ROE = SchrodingerOutputEnum()
_LE = LoggingConfigEnum()
class SchrodingerLicenseTokenGuard(BaseModel):
"""Class that checks, whether enough tokens to execute Glide are available."""
token_pools: Optional[Dict]
prefix_execution: Optional[str] = None
binary_location: Optional[str] = None
wait_interval_seconds: int = 30
wait_limit_seconds: int = 0
_logger: DockingLogger = PrivateAttr()
_executor: Executor = PrivateAttr()
class Config:
underscore_attrs_are_private = True
def __init__(self, **data):
super().__init__(**data)
self._logger = DockingLogger()
# initialize the executor for all "Schrodinger" related calls and also check if it is available
self._executor = Executor(prefix_execution=self.prefix_execution,
binary_location=self.binary_location)
def _get_token_pool_info(self, licadmin_output: list, token_pool: str) -> dict:
result = {"found": False}
for line in licadmin_output:
if token_pool in line:
parts = line.split(' ')
if len(parts) == 16:
result["total"] = int(parts[6])
result["available"] = int(parts[6]) - int(parts[12])
result["found"] = True
break
return result
def _check_licstat_output(self, licadmin_output: list) -> bool:
token_pools_to_check = self.token_pools
all_pools_available = True
for pool_key, pool_token_numbers in token_pools_to_check.items():
pool_status = self._get_token_pool_info(licadmin_output, pool_key)
if pool_status["found"]:
if pool_status["available"] >= pool_token_numbers:
self._logger.log(f"Enough tokens available ({pool_status['available']}) to satisfy requirement ({pool_token_numbers} free tokens) for pool {pool_key}.",
_LE.DEBUG)
else:
self._logger.log(f"Not enough tokens available ({pool_status['available']}) to satisfy requirement ({pool_token_numbers} free tokens) for pool {pool_key}.",
_LE.DEBUG)
all_pools_available = False
else:
all_pools_available = False
self._logger.log(f"Could not find information on token pool {pool_key}.",
_LE.WARNING)
return all_pools_available
def _get_licstat_output(self):
result = self._executor.execute(command=_EE.LICADMIN, arguments=[_EE.LICADMIN_STAT], check=True)
if result.returncode != 0:
self._logger.log(f"Could not execute the Schrodinger license token guard - do you need to export the licadmin path?",
_LE.WARNING)
return result.stdout.split("\n")
def guard(self) -> bool:
# set the parameters for the way the output is checked
self._logger.log(f"Set waiting interval time for Schrodinger token guard to {self.wait_interval_seconds} seconds.",
_LE.DEBUG)
self._logger.log(f"Set waiting limit for Schrodinger token guard to {self.wait_limit_seconds} (seconds, but 0 means \"infinite\").",
_LE.DEBUG)
# loop over the token pools until they are all satisfied or the time limit has run out
counter = 0
success = False
while True:
if self.wait_limit_seconds != 0 and (counter * self.wait_interval_seconds) >= self.wait_limit_seconds:
self._logger.log(f"Wait period ({self.wait_limit_seconds} seconds) set for Schrodinger token guard has been exceeded.",
_LE.ERROR)
break
# reload the output from "licadmin"
# at this stage, the output from licadmin is a list of strings
licadmin_output = self._get_licstat_output()
all_pools_available = self._check_licstat_output(licadmin_output=licadmin_output)
if all_pools_available:
self._logger.log("All token pool requirements for Schrodinger have been met - proceeding.",
_LE.DEBUG)
success = True
break
else:
time.sleep(self.wait_interval_seconds)
counter = counter + 1
return success
import shutil
import tempfile
from collections import OrderedDict
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.loggers.blank_logger import BlankLogger
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.utils.execute_external.TautEnum import TautEnumExecutor
from dockstream.utils.enums.taut_enum_enums import TautEnumEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.core.ligand.ligand import Ligand, get_next_enumeration_number_for_ligand
class TautEnumSmilePreparator:
"""Class that acts as an interface to the "TautEnum" executable prepare and annotate SMILES."""
def __init__(self, enumerate_protonation: bool, original_enumeration: bool,
add_numbers_to_name: bool, prefix_execution=None, binary_location=None):
self._TE = TautEnumEnum()
self._LE = LoggingConfigEnum()
self._logger = LigandPreparationLogger()
self._logger_blank = BlankLogger()
self._enumerate_protonation = enumerate_protonation
self._original_enumeration = original_enumeration
self._add_numbers_to_name = add_numbers_to_name
self._prefix_execution = prefix_execution
self._binary_location = binary_location
# check, if backend is available
self._TautEnum_executor = TautEnumExecutor(prefix_execution=self._prefix_execution,
binary_location=self._binary_location)
if not self._TautEnum_executor.is_available():
raise LigandPreparationFailed("Cannot initialize TautEnum backend - abort.")
self._logger.log(f"Checked taut_enum backend availability (prefix_execution={prefix_execution}).", self._LE.DEBUG)
def annotate_tautomers(self, ligands: list) -> list:
"""Method to build all the tautomers for the input SMILES."""
# 1) generate temporary folder and files
tmp_output_dir_path = tempfile.mkdtemp()
fd, tmp_input_smiles_path = tempfile.mkstemp(suffix=".smi", dir=tmp_output_dir_path)
fd, tmp_output_smiles_path = tempfile.mkstemp(suffix=".smi", dir=tmp_output_dir_path)
# 2) save the SMILES
original_smiles = []
with open(tmp_input_smiles_path, 'w') as f:
for lig in ligands:
f.write(lig.get_smile() + ' ' + str(lig.get_ligand_number()) + "\n")
original_smiles.append(lig.get_original_smile())
self._logger.log(f"Wrote {len(ligands)} smiles to file {tmp_input_smiles_path} for taut_enum input.", self._LE.DEBUG)
# 3) run "TautEnum"
list_args = [self._TE.TAUTENUM_I, tmp_input_smiles_path,
self._TE.TAUTENUM_O, tmp_output_smiles_path]
if self._enumerate_protonation:
list_args.append(self._TE.TAUTENUM_ENUM_PROTO)
if self._original_enumeration:
list_args.append(self._TE.TAUTENUM_ORI_ENUM)
if self._add_numbers_to_name:
list_args.append(self._TE.TAUTENUM_ADD_NUMBERS)
result = self._TautEnum_executor.execute(command=self._TE.TAUTENUM,
arguments=list_args,
check=False)
self._logger.log(f"Executed taut_enum (output file: {tmp_output_smiles_path}).", self._LE.DEBUG)
# 4) generate a dictionary, where the ligand number is matched to the name
names_dict = self._get_name_dict(ligands)
# 5) load and return the smiles; taut_enum output: "COc1cc(c(c(c1OC)OC)Cl)Cc2nc3c(ncnc3n2CCCC#C)N 0_1"
taut_smiles = []
taut_identity = []
with open(tmp_output_smiles_path, 'r') as f:
for line in f:
self._logger_blank.log(line.rstrip("\n"), self._LE.DEBUG)
line = line.strip().split(sep=' ')
taut_smiles.append(line[0])
taut_identity.append(line[1])
buffer = OrderedDict()
for old_lig in ligands:
key = str(old_lig.get_ligand_number())
buffer[key] = {"lig_list": [], "old_lig": old_lig}
for smile, total_id in zip(taut_smiles, taut_identity):
total_id_parts = total_id.split('_')
ligand_number = int(total_id_parts[0])
for old_id in buffer.keys():
if old_id == str(ligand_number):
matched_list = buffer[old_id]["lig_list"]
old_lig = buffer[old_id]["old_lig"]
matched_list.append(Ligand(smile=smile,
original_smile=old_lig.get_original_smile(),
ligand_number=ligand_number,
enumeration=len(matched_list),
molecule=None,
mol_type=None,
name=old_lig.get_name()))
result_list = []
for key in buffer.keys():
old_lig = buffer[key]["old_lig"]
matched_list = buffer[key]["lig_list"]
if len(matched_list) == 0:
result_list.append(old_lig)
continue
for new_lig in matched_list:
result_list.append(new_lig)
# 5) remove temporary files
shutil.rmtree(tmp_output_dir_path)
return result_list
def _get_name_dict(self, ligands: list):
r_dict = {}
for lig in ligands:
r_dict[str(lig.get_ligand_number())] = lig.get_name()
self._logger.log(f"Using the following dictionary to match the ligand number with the names:\n{r_dict}.", self._LE.DEBUG)
return r_dict
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.core.OpenEye.OpenEye_transformator import OpenEyeTransformator
from dockstream.utils.enums.transformations_enums import TransformationEnum
class TransformatorFactory:
"""Returns a list of transformators."""
def __init__(self, conf):
self._TE = TransformationEnum()
self._LE = LoggingConfigEnum()
self._logger = LigandPreparationLogger()
self._conf = conf
def get_transformators(self) -> list:
transformators = []
for curTransConf in self._conf[self._TE.TRANSFORMATIONS]:
if curTransConf[self._TE.TRANSFORMATION_BACKEND] == self._TE.TRANSFORMATION_BACKEND_OPENEYE:
transformators.append(OpenEyeTransformator(curTransConf))
else:
self._logger.log(f"", self._LE.DEBUG)
return transformators
from typing import List, Optional, Union
from pydantic import BaseModel
from dockstream.core.AutodockVina.AutodockVina_docker import AutodockVina
from dockstream.core.Corina.Corina_ligand_preparator import CorinaLigandPreparator
from dockstream.core.Gold.Gold_docker import Gold
from dockstream.core.OpenEye.OpenEye_docker import OpenEye
from dockstream.core.OpenEye.OpenEye_ligand_preparator import OpenEyeLigandPreparator
from dockstream.core.OpenEyeHybrid.OpenEyeHybrid_docker import OpenEyeHybrid
from dockstream.core.RDkit.RDkit_ligand_preparator import RDkitLigandPreparator
from dockstream.core.Schrodinger.Glide_docker import Glide
from dockstream.core.Schrodinger.Ligprep_ligand_preparator import LigprepLigandPreparator
from dockstream.core.rDock.rDock_docker import rDock
class EnvVariable(BaseModel):
key: str
value: str
class Environment(BaseModel):
export: Optional[List[EnvVariable]]
class Logging(BaseModel):
logfile: str
class Header(BaseModel):
environment: Optional[Environment]
logging: Logging
AnyLigandPreparator = Union[
# CorinaLigandPreparator,
LigprepLigandPreparator,
# OpenEyeLigandPreparator,
# RDkitLigandPreparator,
]
class LigandPreparation(BaseModel):
"""Ligand preparation: Specify embedding pool/ligand preparator."""
embedding_pools: Union[AnyLigandPreparator, List[AnyLigandPreparator]]
AnyDocker = Union[
# AutodockVina,
Glide,
# Gold,
# OpenEye,
# OpenEyeHybrid,
# rDock,
]
class DockingInput(BaseModel):
"""Docking input.
Consists of two big parts: ligand preparation and docking runs/backend.
"""
header: Header
ligand_preparation: LigandPreparation
docking_runs: Union[AnyDocker, Optional[List[AnyDocker]]]
class AzdockInput(BaseModel):
"""Welcome to AZDock."""
docking: DockingInput
from copy import deepcopy
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.tag_additions_enum import TagAdditionsEnum
class Ligand:
"""This class bundles all information on a ligand, including all molecule instances present."""
def __init__(self, smile: str, ligand_number: int, enumeration=0, molecule=None, mol_type=None, name=None, original_smile=None):
# initialize
self._LP = LigandPreparationEnum()
self._TA = TagAdditionsEnum()
self._known_types = [self._LP.TYPE_RDKIT, self._LP.TYPE_OPENEYE, self._LP.TYPE_OMEGA,
self._LP.TYPE_CORINA, self._LP.TYPE_GOLD, self._LP.TYPE_LIGPREP, None]
# set attributes
self._smile = self._check_smile(smile)
self._original_smile = original_smile
self._ligand_number = self._check_ligand_number(ligand_number)
self._enumeration = self._check_enumeration(enumeration)
self._molecule = molecule
self._mol_type = self._check_mol_type(mol_type)
self._name = name
self._conformers = []
def __repr__(self):
return "<Ligand id: %s, enumeration: %s, smile: %s>" % (self.get_ligand_number(), self.get_enumeration(), self.get_smile())
def __str__(self):
return f"Ligand id: {self.get_ligand_number()}, enumeration: {self.get_enumeration()}, " + \
f"name: {self.get_name()}, smile: {self.get_smile()}, original_smile: {self.get_original_smile()}, " + \
f"mol_type: {self.get_mol_type()}, has molecule: {True if self.get_molecule() is not None else False}."
def get_clone(self):
clone = Ligand(smile=self.get_smile(),
ligand_number=self.get_ligand_number(),
enumeration=self.get_enumeration(),
molecule=deepcopy(self.get_molecule()),
mol_type=self.get_mol_type(),
name=self.get_name(),
original_smile=self.get_original_smile())
for conformer in self.get_conformers():
clone.add_conformer(deepcopy(conformer))
return clone
def __copy__(self):
return self.get_clone()
def __deepcopy__(self, memo):
return self.get_clone()
def set_name(self, name: str):
self._name = name
def get_name(self) -> str:
return self._name
def add_conformer(self, conformer):
self._conformers.append(conformer)
def set_conformers(self, conformers: list):
self._conformers = conformers
def get_conformers(self):
return self._conformers
def clear_conformers(self):
self._conformers = []
def set_molecule(self, molecule):
self._molecule = molecule
def get_molecule(self):
return self._molecule
def _check_mol_type(self, mol_type) -> str:
if mol_type not in self._known_types:
raise ValueError(f"Type {mol_type} not in list of supported types.")
return mol_type
def set_mol_type(self, mol_type):
self._mol_type = self._check_mol_type(mol_type)
def get_mol_type(self):
return self._mol_type
def _check_smile(self, smile: str) -> str:
if not isinstance(smile, str):
raise ValueError(f"Field smile must be a string not of type {type(smile)}.")
return smile
def set_smile(self, smile: str):
self._smile = self._check_smile(smile)
def get_smile(self):
return self._smile
def set_original_smile(self, smile: str):
self._original_smile = smile
def get_original_smile(self):
return self._original_smile
def _check_ligand_number(self, ligand_number: int):
if not isinstance(ligand_number, int) or ligand_number < 0:
raise ValueError(f"Ligand number must be an integer value (minimally 0), not {ligand_number}.")
return ligand_number
def set_ligand_number(self, ligand_number: int):
self._ligand_number = self._check_ligand_number(ligand_number)
def get_ligand_number(self):
return self._ligand_number
def _check_enumeration(self, enumeration: int):
if not isinstance(enumeration, int) or enumeration < 0:
raise ValueError(f"Enumeration must be an integer value (minimally 0), not {enumeration}.")
return enumeration
def set_enumeration(self, enumeration: int):
self._enumeration = self._check_enumeration(enumeration)
def get_enumeration(self):
return self._enumeration
def get_identifier(self):
return str(self.get_ligand_number()) + ':' + str(self.get_enumeration())
def _add_title_to_molecule(self, molecule, title):
if self.get_mol_type() in [self._LP.TYPE_RDKIT, self._LP.TYPE_CORINA, self._LP.TYPE_GOLD, self._LP.TYPE_OMEGA]:
molecule.SetProp("_Name", str(title))
elif self.get_mol_type() == self._LP.TYPE_OPENEYE:
molecule.SetTitle(str(title))
def _add_tag_to_molecule(self, molecule, tag, value):
if self.get_mol_type() in [self._LP.TYPE_RDKIT, self._LP.TYPE_CORINA, self._LP.TYPE_GOLD, self._LP.TYPE_LIGPREP,
self._LP.TYPE_OMEGA]:
molecule.SetProp(tag, str(value))
elif self.get_mol_type() == self._LP.TYPE_OPENEYE:
import openeye.oechem as oechem
oechem.OESetSDData(molecule, tag, str(value))
else:
raise ValueError(f"Cannot add tags to conformer type {self.get_mol_type()}.")
def add_tags_to_conformers(self):
if len(self.get_conformers()) > 0:
for conformer_number, conformer in enumerate(self.get_conformers()):
self._add_title_to_molecule(conformer, self.get_identifier() + ':' + str(conformer_number))
if self.get_name() is not None:
self._add_tag_to_molecule(conformer, self._TA.TAG_NAME, self.get_name())
self._add_tag_to_molecule(conformer, self._TA.TAG_LIGAND_ID, self.get_ligand_number())
self._add_tag_to_molecule(conformer, self._TA.TAG_ORIGINAL_SMILES, self.get_original_smile())
self._add_tag_to_molecule(conformer, self._TA.TAG_SMILES, self.get_smile())
def add_tags_to_molecule(self):
if self.get_molecule() is not None:
self._add_title_to_molecule(self.get_molecule(), self.get_identifier())
if self.get_name() is not None:
self._add_tag_to_molecule(self.get_molecule(), self._TA.TAG_NAME, self.get_name())
self._add_tag_to_molecule(self.get_molecule(), self._TA.TAG_LIGAND_ID, self.get_ligand_number())
self._add_tag_to_molecule(self.get_molecule(), self._TA.TAG_ORIGINAL_SMILES, self.get_original_smile())
self._add_tag_to_molecule(self.get_molecule(), self._TA.TAG_SMILES, self.get_smile())
def get_next_enumeration_number_for_ligand(ligands: list, ligand_id: int):
max_enumeration = -1
for ligand in ligands:
if ligand.get_ligand_number() == ligand_id:
max_enumeration = max(max_enumeration, ligand.get_enumeration())
return max_enumeration + 1
def get_enumerations_for_ligand(ligands: list, ligand_id: int):
ligand_enumerations = []
for ligand in ligands:
if ligand.get_ligand_number() == ligand_id:
ligand_enumerations.append(deepcopy(ligand))
return ligand_enumerations
def reset_enumerations_for_ligands(ligands: list):
ligand_identifiers = list(set([lig.get_identifier() for lig in ligands]))
cur_enum_list = {k: 0 for k in ligand_identifiers}
for lig in ligands:
cur_id = lig.get_identifier()
lig.set_enumeration(cur_enum_list[lig.get_identifier()])
cur_enum_list[cur_id] += 1
def find_ligand(ligands: list, ligand_id: int, enumeration: int = 0):
for ligand in ligands:
if ligand.get_ligand_number() == ligand_id and ligand.get_enumeration() == enumeration:
return ligand
return None
import os
import pandas as pd
from typing import Optional, Any
from pydantic import BaseModel, PrivateAttr
from rdkit import Chem
from dockstream.core.ligand_preparator import Input
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.core.ligand.ligand import Ligand
from dockstream.utils.smiles import standardize_smiles
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum
from dockstream.utils.smiles import to_smiles
_DE = DockingConfigurationEnum()
_LP = LigandPreparationEnum()
_LE = LoggingConfigEnum()
class LigandInputParser(BaseModel):
"""This class is able to parse various input specifications and produces a list of Ligand objects."""
smiles: Optional[Any]
ligand_number_start: int = 0
input: Input
_logger = PrivateAttr()
class Config:
underscore_attrs_are_private = True
def __init__(self, **data):
super().__init__(**data)
self._logger = LigandPreparationLogger()
# extract parts from the configuration for convenience and try to infer input type, if not explicitly stated
if self.input.type is None:
self._logger.log(
"Input type has not been explicitly specified, will attempt to infer it - this is not recommended.",
_LE.WARNING)
self.input.type = self._infer_input_type()
self.input.type = self.input.type.upper()
#self._do_standardize_smiles = nested_get(self._pool_parameters, [_LP.INPUT,
# _LP.INPUT_STANDARDIZE_SMILES],
# default=False)
def get_ligands(self) -> list:
if self.input.type == _LP.INPUT_TYPE_CONSOLE:
return self._ligands_from_console()
elif self.input.type == _LP.INPUT_TYPE_LIST:
return self._ligands_from_smiles_list(self.smiles)
elif self.input.type == _LP.INPUT_TYPE_SMI:
return self._ligands_from_smi_file()
elif self.input.type == _LP.INPUT_TYPE_CSV:
return self._ligands_from_csv_file()
elif self.input.type == _LP.INPUT_TYPE_SDF:
return self._ligands_from_sdf_file()
else:
raise LigandPreparationFailed(f"Input file type {self.input.type} is not supported.", _LE.ERROR)
def _ligands_from_console(self) -> list:
ligand_smiles = self.smiles.split(';')
return self._ligands_from_smiles_list(ligand_smiles)
def _ligands_from_smi_file(self) -> list:
if self.input.input_path is None:
self._logger.log("When using SMI input, an input path has to be specified.", _LE.ERROR)
with open(self.input.input_path) as f_input:
ligand_smiles = f_input.readlines()
ligand_smiles = [x.strip() for x in ligand_smiles]
return self._ligands_from_smiles_list(ligand_smiles)
def _ligands_from_smiles_list(self, smiles: list) -> list:
#if self._do_standardize_smiles:
# smiles = self._standardize_smiles(smiles)
return_list = []
for number_smile, smile in enumerate(smiles):
return_list.append(Ligand(smile=smile,
original_smile=smile,
ligand_number=number_smile + self.ligand_number_start,
enumeration=0,
molecule=None,
mol_type=None,
name=None))
return return_list
def _ligands_from_csv_file(self) -> list:
if self.input.input_path is None:
self._logger.log("When using CSV input, an input path has to be specified.", _LE.ERROR)
if self.input.columns.smiles is None:
self._logger.log("When using CSV input, a smiles column has to be specified.", _LE.ERROR)
# load data and check
data = pd.read_csv(self.input.input_path,
delimiter=self.input.delimiter)
if self.input.columns.smiles not in list(data.columns):
raise LigandPreparationFailed(f"Could not find column {self.input.columns.smiles} in input file {self.input.input_path} with columns {list(data.columns)}.")
names_ligands = None
if self.input.columns.names is not None and self.input.columns.names in list(data.columns):
names_ligands = [str(x) for x in data[self.input.columns.names].tolist()]
# generate ligands
ligands = self._ligands_from_smiles_list([str(x) for x in data[self.input.columns.smiles].tolist()])
if names_ligands is not None and len(ligands) == len(names_ligands):
for ligand, name in zip(ligands, names_ligands):
ligand.set_name(name)
return ligands
def _ligands_from_sdf_file(self) -> list:
if self.input.input_path is None:
self._logger.log("When using SDF input, an input path has to be specified.", _LE.ERROR)
lig_container = []
mol_supplier = Chem.SDMolSupplier(self.input.input_path, removeHs=False)
for mol_id, mol in enumerate(mol_supplier):
name = None
if self.input.tags is not None and _LP.INPUT_SDF_TAGNAME_NAMES in self.input.tags.keys():
name_tag = self.input.tags[_LP.INPUT_SDF_TAGNAME_NAMES]
if mol.HasProp(name_tag):
name = str(mol.GetProp(name_tag))
else:
self._logger.log(f"Molecule number {mol_id} in input SDF file does not have name tag {name_tag} - will set to None.",
_LE.DEBUG)
if self.input.initialization_mode == _LP.INITIALIZATION_MODE_ORDER:
lig_container.append(Ligand(smile=to_smiles(mol),
original_smile=to_smiles(mol),
ligand_number=mol_id,
molecule=mol,
mol_type=_LP.TYPE_RDKIT,
name=name))
elif self.input.initialization_mode == _LP.INITIALIZATION_MODE_AZDOCK:
# TODO: fix / handle case where docked poses (with X:X:X) are fed in
parts = str(mol.GetProp("_Name")).split(':')
lig_container.append(Ligand(smile=to_smiles(mol),
original_smile=to_smiles(mol),
ligand_number=int(parts[0]),
enumeration=int(parts[1]),
molecule=mol,
mol_type=_LP.TYPE_RDKIT,
name=name))
else:
raise ValueError(f"Initialization mode {self.input.initialization_mode} is not supported.")
return lig_container
def _standardize_smiles(self, smiles: list) -> list:
# TODO: think about removing this altogether
ligand_smiles = standardize_smiles(smiles, min_heavy_atoms=2, max_heavy_atoms=500,
element_list=None, remove_long_side_chains=False,
neutralise_charges=False)
self._logger.log("Ligand smiles have been standardized.", _LE.DEBUG)
return ligand_smiles
def _infer_input_type(self) -> str:
if self.smiles is not None or self.input.input_path is None:
if isinstance(self.smiles, list):
self._logger.log(f"Inferred pool type {_LP.INPUT_TYPE_LIST}.", _LE.WARNING)
return _LP.INPUT_TYPE_LIST
else:
self._logger.log(f"Inferred pool type {_LP.INPUT_TYPE_CONSOLE}.", _LE.WARNING)
return _LP.INPUT_TYPE_CONSOLE
else:
_, ext = os.path.splitext(self.input.input_path)
ext = ext.lstrip('.').upper()
if ext == _LP.INPUT_TYPE_SMI:
self._logger.log(f"Inferred pool type {_LP.INPUT_TYPE_SMI}.", _LE.WARNING)
return _LP.INPUT_TYPE_SMI
elif ext == _LP.INPUT_TYPE_CSV:
self._logger.log(f"Inferred pool type {_LP.INPUT_TYPE_CSV}.", _LE.WARNING)
return _LP.INPUT_TYPE_CSV
elif ext == _LP.INPUT_TYPE_SDF:
self._logger.log(f"Inferred pool type {_LP.INPUT_TYPE_SDF}.", _LE.WARNING)
return _LP.INPUT_TYPE_SDF
else:
raise LigandPreparationFailed("Could not make educated guess on input type - abort.")
from copy import deepcopy
from typing import List, Optional, Dict, Union
from pydantic import BaseModel, PrivateAttr
from rdkit import Chem, RDLogger
from dockstream.core.RDkit.RDkit_stereo_enumerator import RDKitStereoEnumerator
from dockstream.core.ligand.ligand import find_ligand
from dockstream.core.TautEnum.taut_enum_smile_preparation import TautEnumSmilePreparator
from dockstream.core.factories.transformator_factory import TransformatorFactory
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.RDkit_enums import RDkitLigandPreparationEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.enums.transformations_enums import TransformationEnum
from dockstream.utils.enums.stereo_enumeration_enums import StereoEnumerationEnum
_DE = DockingConfigurationEnum()
_LP = LigandPreparationEnum()
_RLP = RDkitLigandPreparationEnum()
_LE = LoggingConfigEnum()
_TE = TransformationEnum()
_SE = StereoEnumerationEnum()
class CSVInput(BaseModel):
smiles: str
names: Optional[str] = None
class TautEnumInput(BaseModel):
prefix_execution: str = None
binary_location: str = None
enumerate_protonation: bool = False
class AlignInput(BaseModel):
mode: str
reference_paths: List[str]
reference_format: str
minimum_substructure_ratio: float = 0.2
fail_action: str = "keep"
complete_rings_only: bool = True
tethering: bool = False
class TransformationInput(BaseModel):
type: str
backend: str
smirks: Optional[str]
fail_action: str = "keep"
AnyStereoEnumerator = Union[RDKitStereoEnumerator] # Add more when available.
class Input(BaseModel):
type: Optional[str]
input_path: Optional[str]
tags: Optional[Dict[str, str]] = None
delimiter: Optional[str] = ','
initialization_mode: Optional[str] = _LP.INITIALIZATION_MODE_ORDER
columns: Optional[CSVInput] = None
use_taut_enum: Optional[TautEnumInput] = None
stereo_enumeration: Optional[AnyStereoEnumerator] = None
transformations: Optional[List[TransformationInput]] = None
class Output(BaseModel):
format: str
conformer_path: str
class LigandPreparator(BaseModel):
"""Base class implementing the interface for all docking preparation classes."""
pool_id: str
input: Input
align: Optional[AlignInput] = None
output: Optional[Output]
ligands: Optional[List] = None
_logger = PrivateAttr()
_references: List = PrivateAttr(default=None)
class Config:
underscore_attrs_are_private = True
def __init__(self, **data):
super().__init__(**data)
self._logger = LigandPreparationLogger()
if self.ligands is not None and len(self.ligands) >= 1:
self._initialize_ligands()
def add_ligands(self, ligands):
self.ligands = ligands
self._initialize_ligands()
def _initialize_ligands(self):
# store ligands as list of "Ligands", generated either from smiles or molecules
if not isinstance(self.ligands, list):
self.ligands = [self.ligands]
if len(self.ligands) == 0:
raise LigandPreparationFailed("Specify at least one ligand (or a list).")
# enumerate ligand smiles with tautomers / protomers, if specified
if self.input.use_taut_enum is not None:
self._taut_enum()
# enumerate ligand smiles stereochemically, if specified
if self.input.stereo_enumeration is not None:
self._enumerate_stereoisomers()
# apply transformations (e.g. SMIRKS), if specified
if self.input.transformations is not None:
self._apply_transformations()
# treat the reference molecule(s) and store it internally as a list, if specified
if self.align is not None:
self._load_references()
def _enumerate_stereoisomers(self):
length_before = self.get_number_ligands()
self.ligands = self.input.stereo_enumeration.enumerate(self.ligands)
self._logger.log(f"Enumerated stereo-isomers (expanded {length_before} to {self.get_number_ligands()} enumerations).",
_LE.DEBUG)
def _taut_enum(self):
taut_enum = TautEnumSmilePreparator(enumerate_protonation=self.input.use_taut_enum.enumerate_protonation,
original_enumeration=True,
add_numbers_to_name=True,
prefix_execution=self.input.use_taut_enum.prefix_execution,
binary_location=self.input.use_taut_enum.binary_location)
# taut_enum will return a list of "Ligand" objects, conditionally expanded by enumerated versions
self.ligands = taut_enum.annotate_tautomers(ligands=self.ligands)
self._logger.log("Executed taut_enum.", _LE.INFO)
self._logger.log(f"Stored {len(self.ligands)} smiles from taut_enum output.", _LE.DEBUG)
def _apply_transformations(self):
number_transformations = 0
list_transformators = TransformatorFactory(self.input.transformations).get_transformators()
for transformator in list_transformators:
self.ligands = transformator.transform(self.ligands)
number_transformations += 1
self._logger.log(f"Executed {number_transformations} transformation(s).", _LE.DEBUG)
self._logger.log(f"After transformation stage, {len(self.ligands)} smiles were stored.", _LE.DEBUG)
def set_references(self, references):
# usually, references are loaded from files; but this function allows setting them as a list of molecules
if references is not None:
if not isinstance(references, list):
references = [references]
self._references = references
def _load_references(self):
raise NotImplementedError("This method is backend-specific and must be implemented by each individual child class.")
def get_number_ligands(self):
return len(self.ligands)
def get_ligands(self):
return self.ligands
def get_number_references(self):
if self._references is not None:
return len(self._references)
else:
return None
def get_references(self):
return self._references
def _get_RDkit_aligner(self, conf, ligands):
raise NotImplementedError("This method is backend-specific and must be implemented by each individual child class.")
def generate3Dcoordinates(self):
raise NotImplementedError("This method is backend-specific and must be implemented by each individual child class.")
def align_ligands(self):
raise NotImplementedError("This method is backend-specific and must be implemented by each individual child class.")
def _align_ligands_with_RDkit_preparator(self, ligands: list):
if self.align.mode != _LP.ALIGN_MODE_INTERNAL:
raise LigandPreparationFailed("Only internal alignment supported at the moment.")
if self._references is None:
raise LigandPreparationFailed("No reference molecule has been found.")
# at this stage, "generate3Dcoordinates()" has been used to generate the conformers; use the internal alignment to a reference
RDLogger.DisableLog("rdApp.*")
rdkit_conf = {_LP.POOLID: "dummyPool",
_LP.INPUT: {},
_LP.TYPE: _LP.TYPE_RDKIT,
_LP.PARAMS: {
_RLP.EP_PARAMS_COORDGEN: {
_RLP.EP_PARAMS_COORDGEN_METHOD: _RLP.EP_PARAMS_COORDGEN_UFF,
_RLP.EP_PARAMS_COORDGEN_UFF_MAXITERS: 600
}
},
_LP.ALIGN: deepcopy(self.align.dict())}
aligner = self._get_RDkit_aligner(conf=rdkit_conf,
ligands=[deepcopy(lig) for lig in ligands])
aligner.align_ligands()
# overwrite molecules for those ligands, that could be aligned
for aligned_lig in aligner.get_ligands():
ligand = find_ligand(ligands=ligands,
ligand_id=aligned_lig.get_ligand_number(),
enumeration=aligned_lig.get_enumeration())
if ligand is not None:
ligand.set_molecule(aligned_lig.get_molecule())
return ligands
def write_ligands(self, path, format):
format = format.upper()
ligands_copy = [deepcopy(lig) for lig in self.ligands]
# check and specify format of file
# RDkit does not support the write-out of MOL2 files (apparently because of the format's inherent ambiguity)
if format == _LP.OUTPUT_FORMAT_SDF:
writer = Chem.SDWriter(path)
for lig in ligands_copy:
lig.add_tags_to_molecule()
if lig.get_molecule() is not None:
mol = deepcopy(lig.get_molecule())
mol.SetProp("_Name", lig.get_identifier())
writer.write(mol)
writer.close()
elif format == _LP.OUTPUT_FORMAT_MAE:
raise LigandPreparationFailed("Write-out as maestro file not yet implemented.")
else:
raise LigandPreparationFailed("Specified output format unknown.")
self._logger.log(f"Wrote {len(self.ligands)} molecules to file {path} (format: {format}).", _LE.DEBUG)
import warnings
from pdbfixer import PDBFixer
from simtk.openmm.vec3 import Vec3
from simtk.openmm.app import PDBFile
from dockstream.loggers.target_preparation_logger import TargetPreparationLogger
from dockstream.utils.dockstream_exceptions import TargetPreparationFailed
from dockstream.containers.target_preparation_container import TargetPreparationContainer
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
class PDBPreparator:
"""Wrapper class for "PDBFixer" functionality."""
def __init__(self, conf: TargetPreparationContainer):
self._TE = TargetPreparationEnum()
self._TL = LoggingConfigEnum()
self._logger = TargetPreparationLogger()
self._config = conf
def fix_pdb(self, input_pdb_file, output_pdb_file):
"""Function that loads a PDB file and writes a fixed version to another PDB file."""
if self._TE.FIX not in self._config[self._TE.TARGETPREP].keys():
raise TargetPreparationFailed("Cannot fix target, if respective configuration block is missing.")
paras = self._config[self._TE.TARGETPREP][self._TE.FIX]
# load the target from a PDB file (generated temporarily in the child classes)
fixer = PDBFixer(filename=input_pdb_file)
self._logger.log(f"Initialized PDBFixer.", self._TL.DEBUG)
# perform fixes specified in the configuration
fixer.findMissingResidues()
fixer.findNonstandardResidues()
if paras[self._TE.FIX_STANDARDIZE]:
fixer.replaceNonstandardResidues()
self._logger.log(f"Replaced {len(fixer.nonstandardResidues)} non-standard residues.", self._TL.DEBUG)
if paras[self._TE.FIX_REMOVEHETEROGENS]:
fixer.removeHeterogens(keepWater=True)
self._logger.log("Removed heterogens.", self._TL.DEBUG)
if paras[self._TE.FIX_MISSINGHEAVYATOMS]:
fixer.findMissingAtoms()
fixer.addMissingAtoms()
self._logger.log(f"Added {len(fixer.missingAtoms)} missing atoms.", self._TL.DEBUG)
if paras[self._TE.FIX_MISSINGHYDROGENS]:
fixer.addMissingHydrogens(pH=7.0)
self._logger.log("Added missing hydrogens.", self._TL.DEBUG)
if paras[self._TE.FIX_ADDWATERBOX]:
# one could use the crystallographic unit cell, but that might be missing from the HEADER so go for a cubic
# cell with some distance instead
maxSize = max(max((pos[i] for pos in fixer.positions)) -
min((pos[i] for pos in fixer.positions)) for i in range(3))
boxSize = maxSize * Vec3(1, 1, 1)
fixer.addSolvent(boxSize)
self._logger.log("Added water box.", self._TL.DEBUG)
# write out the fixed target as another PDB file
with warnings.catch_warnings():
warnings.simplefilter("ignore")
PDBFile.writeFile(fixer.topology, fixer.positions, open(output_pdb_file, 'w'))
import os
import tempfile
import shutil
import multiprocessing
from copy import deepcopy
from typing import Optional, List, Any
import rdkit.Chem as Chem
from pydantic import BaseModel
from typing_extensions import Literal
from dockstream.core.Schrodinger.Glide_docker import Parallelization
from dockstream.core.docker import Docker
from dockstream.core.rDock.rDock_result_parser import rDockResultParser
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.execute_external.rDock import rDockExecutor
from dockstream.utils.enums.rDock_enums import rDockExecutablesEnum, rDockDockingConfigurationEnum, rDockRbdockOutputEnum
from dockstream.utils.enums.RDkit_enums import RDkitLigandPreparationEnum
from dockstream.utils.translations.molecule_translator import MoleculeTranslator
from dockstream.utils.dockstream_exceptions import DockingRunFailed
_LE = LoggingConfigEnum()
_LP = RDkitLigandPreparationEnum()
_CE = rDockDockingConfigurationEnum()
_EE = rDockExecutablesEnum()
_ROE = rDockRbdockOutputEnum()
class rDockParameters(BaseModel):
prefix_execution: Optional[str] = None
binary_location: Optional[str] = None
parallelization: Optional[Parallelization]
rbdock_prm_paths: List[str]
number_poses: int
def get(self, key: str) -> Any:
"""Temporary method to support nested_get"""
return self.dict()[key]
class rDock(Docker):
"""Interface to the "rDock" backend."""
backend: Literal["rDock"] = "rDock"
parameters: rDockParameters
_rDock_executor: rDockExecutor = None
class Config:
underscore_attrs_are_private = True
def __init__(self, **run_parameters):
super().__init__(**run_parameters)
def _initialize_executor(self):
"""Initialize the executor for all "rDock" related calls and also check if it is available."""
if self._rDock_executor is None:
self._rDock_executor = rDockExecutor(
prefix_execution=self.parameters.prefix_execution,
binary_location=self.parameters.binary_location
)
if not self._rDock_executor.is_available():
raise DockingRunFailed("Cannot initialize rDock docker, as rDock backend is not available - abort.")
self._rDock_executor.set_env_vars()
self._logger.log(f"Checked rDock backend availability (prefix_execution={self.parameters.prefix_execution}).", _LE.DEBUG)
def _get_score_from_conformer(self, conformer):
return float(conformer.GetProp(_ROE.SCORE))
def add_molecules(self, molecules: list):
mol_trans = MoleculeTranslator(self.ligands, force_mol_type=_LP.TYPE_RDKIT)
mol_trans.add_molecules(molecules)
self.ligands = mol_trans.get_as_rdkit()
self._docking_performed = False
def _generate_temporary_input_output_files(self, start_indices, sublists):
# in case singletons are handed over, wrap them in a list for "zipping" later
if not isinstance(start_indices, list):
start_indices = [start_indices]
if not isinstance(sublists, list):
sublists = [sublists]
tmp_output_dirs = []
tmp_input_sdf_paths = []
tmp_output_sdf_paths = []
for start_index, sublist in zip(start_indices, sublists):
# generate temporary input file and output directory into which "rbdock" will deposit the poses
cur_tmp_output_dir = tempfile.mkdtemp()
_, cur_tmp_sdf = tempfile.mkstemp(prefix=str(start_index), suffix=".sdf", dir=cur_tmp_output_dir)
# write-out the temporary input file
one_written = False
writer = Chem.SDWriter(cur_tmp_sdf)
for ligand in sublist:
# initialize all ligands (as they could have failed)
if ligand.get_molecule() is not None:
mol = deepcopy(ligand.get_molecule())
one_written = True
mol.SetProp("_Name", ligand.get_identifier())
writer.write(mol)
writer.close()
if one_written is False:
if os.path.isdir(cur_tmp_output_dir):
shutil.rmtree(cur_tmp_output_dir)
continue
tmp_output_dirs.append(cur_tmp_output_dir)
tmp_input_sdf_paths.append(cur_tmp_sdf)
tmp_output_sdf_paths.append('.'.join([cur_tmp_output_dir, "sd"]))
return tmp_output_dirs, tmp_input_sdf_paths, tmp_output_sdf_paths
def _dock(self, number_cores):
self._initialize_executor()
start_indices, sublists = self.get_sublists_for_docking(number_cores=number_cores)
number_sublists = len(sublists)
self._logger.log(f"Split ligands into {number_sublists} sublists for docking.", _LE.DEBUG)
jobs_submitted = 0
slices_per_iteration = min(number_cores, number_sublists)
while jobs_submitted < len(sublists):
upper_bound_slice = min((jobs_submitted + slices_per_iteration), len(sublists))
cur_slice_start_indices = start_indices[jobs_submitted:upper_bound_slice]
cur_slice_sublists = sublists[jobs_submitted:upper_bound_slice]
# generate paths and initialize molecules (so that if they fail, this can be covered)
tmp_output_dirs, tmp_input_sdf_paths, \
tmp_output_sdf_paths = self._generate_temporary_input_output_files(cur_slice_start_indices,
cur_slice_sublists)
# run in parallel
processes = []
for chunk_index in range(len(tmp_output_dirs)):
p = multiprocessing.Process(target=self._dock_subjob, args=(tmp_input_sdf_paths[chunk_index],
tmp_output_dirs[chunk_index],
tmp_output_sdf_paths[chunk_index]))
processes.append(p)
p.start()
jobs_submitted += 1
for p in processes:
p.join()
# load the chunks and recombine the result; add conformations
for chunk_index in range(len(tmp_output_dirs)):
if not os.path.isfile(tmp_output_sdf_paths[chunk_index]) or os.path.getsize(tmp_output_sdf_paths[chunk_index]) == 0:
continue
# do not sanitize, because rDock sometimes produces stuff that cannot be kekulized
for molecule in Chem.SDMolSupplier(tmp_output_sdf_paths[chunk_index], sanitize=False, removeHs=False):
# it can happen, that ligands have "impossible chemistry" and will be loaded by RDkit as "None"
if molecule is None:
continue
cur_conformer_name = str(molecule.GetProp(_ROE.NAME))
# add molecule to the appropriate ligand
for ligand in self.ligands:
if ligand.get_identifier() == cur_conformer_name:
ligand.add_conformer(molecule)
break
# clean-up
for path in tmp_output_dirs:
shutil.rmtree(path)
self._log_docking_progress(number_done=jobs_submitted, number_total=number_sublists)
# sort the conformers (best to worst), update their names to contain the conformer id and add tags
# -> <ligand_number>:<enumeration>:<conformer_number>
for ligand in self.ligands:
ligand.set_conformers(sorted(ligand.get_conformers(),
key=lambda x: float(x.GetProp(_ROE.SCORE)), reverse=False))
ligand.add_tags_to_conformers()
# log any docking fails
self._docking_fail_check()
# parse the result of the docking step
result_parser = rDockResultParser([ligand.get_clone() for ligand in self.ligands])
self._df_results = result_parser.as_dataframe()
# docking flag
self._docking_performed = True
def _dock_subjob(self, input_path_sdf, output_dir_path, output_sdf_path):
# set up arguments list and execute
# for an explanation of the parameters, see "rDockExecutablesEnum"
# TODO: support "ensemble docking" - currently, only the first entry is used
arguments = [_EE.RBDOCK_R, self.parameters.rbdock_prm_paths[0],
_EE.RBDOCK_I, input_path_sdf,
_EE.RBDOCK_O, output_dir_path,
_EE.RBDOCK_N, str(self.parameters.number_poses),
_EE.RBDOCK_S, str(_EE.RBDOCK_S_DEFAULT),
_EE.RBDOCK_P, _EE.RBDOCK_P_DEFAULT]
execution_result = self._rDock_executor.execute(command=_EE.RBDOCK,
arguments=arguments,
check=True)
self._delay4file_system(path=output_sdf_path)
self._logger.log(f"Finished sublist (input: {input_path_sdf}, output directory: {output_dir_path}).", _LE.DEBUG)
def write_docked_ligands(self, path, mode="all"):
self._write_docked_ligands(path, mode, mol_type=_LP.TYPE_RDKIT)
import pandas as pd
from dockstream.core.result_parser import ResultParser
from dockstream.utils.enums.rDock_enums import rDockRbdockOutputEnum, rDockResultKeywordsEnum
class rDockResultParser(ResultParser):
"""Class that loads, parses and analyzes the output of an "rDock" docking run, including poses and scores."""
def __init__(self, ligands):
super().__init__(ligands=ligands)
self._ROE = rDockRbdockOutputEnum()
self._RK = rDockResultKeywordsEnum()
self._df_results = self._construct_dataframe()
def _construct_dataframe(self) -> pd.DataFrame:
def func_get_score(conformer):
return float(conformer.GetProp(self._ROE.SCORE))
return super()._construct_dataframe_with_funcobject(func_get_score)
import abc
import pandas as pd
import warnings
from copy import deepcopy
from dockstream.core.ligand.ligand import Ligand
from dockstream.utils.dockstream_exceptions import ResultParsingFailed
from dockstream.loggers.docking_logger import DockingLogger
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.enums.docking_enum import ResultKeywordsEnum
class ResultParser(metaclass=abc.ABCMeta):
"""Base class implementing the interface result parsing classes."""
def __init__(self, ligands):
self._LE = LoggingConfigEnum()
self._logger = DockingLogger()
self._RK = ResultKeywordsEnum()
self._ligands = ligands
self._df_results = None
def as_dataframe(self, aggregate=False):
if aggregate:
warnings.warn("For now, \"aggregate\" is not available.")
if isinstance(aggregate, bool) and aggregate is False:
return deepcopy(self._df_results)
else:
raise ResultParsingFailed("Parameter aggregate has an illegal value.")
@staticmethod
def _get_name(ligand: Ligand, conformer_index: int):
"""Function to make get either the name (for named molecules) or the identifier (plus the conformer) for the dataframe."""
if ligand.get_name() is None:
return ligand.get_identifier() + ':' + str(conformer_index)
else:
return ligand.get_name()
def _construct_dataframe_with_funcobject(self, func_get_score) -> pd.DataFrame:
data_buffer = []
for ligand in self._ligands:
best = True
for conformer_index, conformer in enumerate(ligand.get_conformers()):
name = self._get_name(ligand, conformer_index)
row = [ligand.get_ligand_number(),
ligand.get_enumeration(),
conformer_index,
name,
func_get_score(conformer),
ligand.get_smile(),
best]
best = False
data_buffer.append(row)
return pd.DataFrame(data_buffer, columns=[self._RK.DF_LIGAND_NUMBER,
self._RK.DF_LIGAND_ENUMERATION,
self._RK.DF_CONFORMER,
self._RK.DF_LIGAND_NAME,
self._RK.DF_SCORE,
self._RK.DF_SMILES,
self._RK.DF_LOWEST_CONFORMER])
from abc import ABC
from pydantic import BaseModel, PrivateAttr
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.enums.stereo_enumeration_enums import StereoEnumerationEnum
_LE = LoggingConfigEnum()
_SE = StereoEnumerationEnum()
class StereoEnumerator(ABC, BaseModel):
_logger = PrivateAttr()
class Config:
underscore_attrs_are_private = True
def __init__(self, **data):
super().__init__(**data)
self._logger = LigandPreparationLogger()
def enumerate(self, ligands: list) -> list:
raise NotImplementedError
import abc
from dockstream.containers.target_preparation_container import TargetPreparationContainer
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.loggers.target_preparation_logger import TargetPreparationLogger
from dockstream.loggers.blank_logger import BlankLogger
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
class TargetPreparator(metaclass=abc.ABCMeta):
"""Virtual base class implementing the interface for all specific target preparators and the general preparation
of the docking target."""
def __init__(self, conf: TargetPreparationContainer, run_number=0):
self._TE = TargetPreparationEnum()
self._TL = LoggingConfigEnum()
self._logger = TargetPreparationLogger()
self._logger_blank = BlankLogger()
self._config = conf
self._target = None
# store the specific parameters for this very run for easy access later; the others are ignored
self._run_parameters = self._config[self._TE.TARGETPREP][self._TE.RUNS][run_number]
def get_target(self):
return self._target
def specify_cavity(self):
raise NotImplementedError("This method needs to be overwritten by child classes.")
def write_target(self, path):
raise NotImplementedError("This method needs to be overwritten by child classes.")
import abc
from dockstream.utils.dockstream_exceptions import LigandPreparationFailed
from dockstream.loggers.ligand_preparation_logger import LigandPreparationLogger
from dockstream.loggers.blank_logger import BlankLogger
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.enums.transformations_enums import TransformationEnum
class Transformator(metaclass=abc.ABCMeta):
"""Base class implementing the interface for transformations applied to smiles before embedding."""
def __init__(self, conf):
self._LE = LoggingConfigEnum()
self._TE = TransformationEnum()
self._logger = LigandPreparationLogger()
self._logger_blank = BlankLogger()
self._conf = conf
# extract type specification
self._backend = conf[self._TE.TRANSFORMATION_BACKEND]
if self._backend not in [self._TE.TRANSFORMATION_BACKEND_OPENEYE]:
self._logger.log(f"Transformation backend {self._backend} is unknown.", self._LE.ERROR)
raise LigandPreparationFailed(f"Transformation backend {self._backend} is unknown.")
# extract backend specification
self._type = conf[self._TE.TRANSFORMATION_TYPE]
if self._type == self._TE.TRANSFORMATION_TYPE_SMIRKS:
self._smirk = conf[self._TE.TRANSFORMATION_SMIRKS]
else:
self._logger.log(f"Transformation type {self._type} is unknown.", self._LE.ERROR)
raise LigandPreparationFailed(f"Transformation type {self._type} is unknown.")
# treat fail action specification
self._fail_action = conf[self._TE.TRANSFORMATION_FAIL_ACTION]
if self._fail_action not in [self._TE.TRANSFORMATION_FAIL_ACTION_KEEP,
self._TE.TRANSFORMATION_FAIL_ACTION_DISCARD]:
self._logger.log(f"Fail action {self._fail_action} is unknown.", self._LE.ERROR)
raise LigandPreparationFailed(f"Fail action {self._fail_action} is unknown.")
def transform(self, ligands: list) -> list:
raise NotImplementedError
from abc import ABC, abstractmethod
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
class BaseLogger(ABC):
def __init__(self):
self._LE = LoggingConfigEnum()
self._logger = self._initialize_logger()
def log(self, message: str, level: str):
if level == self._LE.DEBUG:
self._logger.debug(message)
elif level == self._LE.INFO:
self._logger.info(message)
elif level == self._LE.WARNING:
self._logger.warning(message)
elif level == self._LE.ERROR:
self._logger.error(message)
elif level == self._LE.EXCEPTION:
self._logger.exception(message)
else:
raise ValueError("Logger level not supported.")
@abstractmethod
def _initialize_logger(self):
raise NotImplementedError("Overwrite this method in child classes.")
import logging
from dockstream.loggers.base_logger import BaseLogger
class BlankLogger(BaseLogger):
"""This logger serves as a "verbatim" interface."""
def __init__(self):
super().__init__()
def _initialize_logger(self):
logger = logging.getLogger(self._LE.LOGGER_BLANK)
return logger
import logging
from dockstream.loggers.base_logger import BaseLogger
class DockingLogger(BaseLogger):
def __init__(self):
super().__init__()
def _initialize_logger(self):
logger = logging.getLogger(self._LE.LOGGER_DOCKING)
return logger
import logging
#from torch.utils.tensorboard import SummaryWriter
from dockstream.loggers.base_logger import BaseLogger
#from utils.logging.tensorboard import add_mols
class InterfaceLogger(BaseLogger):
def __init__(self):
super().__init__()
#self._summary_writer = self._instantiate_summary_writer(configuration)
def _initialize_logger(self):
logger = logging.getLogger(self._LE.LOGGER_INTERFACE)
return logger
#def __del__(self):
# self._summary_writer.close()
#def _log_timestep(self, smiles: np.array, likelihoods: np.array):
# fraction_valid_smiles = utils_general.fraction_valid_smiles(smiles)
# fraction_unique_entries = self._get_unique_entires_fraction(likelihoods)
# self._visualize_structures(smiles)
# self._summary_writer.add_text('Data', f'Valid SMILES: {fraction_valid_smiles}% '
# f'Unique Mols: {fraction_unique_entries}% ')
#def _visualize_structures(self, smiles):
# list_of_labels, list_of_mols = self._count_unique_inchi_keys(smiles)
# if len(list_of_mols) > 0:
# add_mols(self._summary_writer, "Most Frequent Molecules", list_of_mols, self._rows, list_of_labels)
#def _instantiate_summary_writer(self, configuration):
# log_config = SamplingLoggerConfiguration(**configuration.logging)
# return SummaryWriter(log_dir=log_config.logging_path)
\ No newline at end of file
import logging
from dockstream.loggers.base_logger import BaseLogger
class LigandPreparationLogger(BaseLogger):
def __init__(self):
super().__init__()
def _initialize_logger(self):
logger = logging.getLogger(self._LE.LOGGER_LIGAND_PREPARATION)
return logger
import logging
from dockstream.loggers.base_logger import BaseLogger
class TargetPreparationLogger(BaseLogger):
def __init__(self):
super().__init__()
def _initialize_logger(self):
logger = logging.getLogger(self._LE.LOGGER_TARGET_PREPARATION)
return logger
import argparse
def str2bool(inp):
if isinstance(inp, bool):
return inp
if inp.lower() in ("yes", "true", 't', 'y', '1'):
return True
elif inp.lower() in ("no", "false", 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError("Expected castable string or boolean value as input.")
class LigandPreparationFailed(Exception):
pass
class ConfigParsingFailed(Exception):
pass
class DockingRunFailed(Exception):
pass
class TargetPreparationFailed(Exception):
pass
class ResultParsingFailed(Exception):
pass
class TransformationFailed(Exception):
pass
def get_exception_message(e: Exception):
if hasattr(e, "message"):
return e.message
else:
return e
from dockstream.core.ligand.ligand_input_parser import LigandInputParser
from dockstream.utils.dockstream_exceptions import *
from dockstream.core.RDkit.RDkit_ligand_preparator import RDkitLigandPreparator
from dockstream.core.OpenEye.OpenEye_ligand_preparator import OpenEyeLigandPreparator
from dockstream.core.Corina.Corina_ligand_preparator import CorinaLigandPreparator
from dockstream.core.Schrodinger.Ligprep_ligand_preparator import LigprepLigandPreparator
from dockstream.core.OpenEyeHybrid.Omega_ligand_preparator import OmegaLigandPreparator
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
def embed_ligands(smiles, pool_number, pool, logger, ligand_number_start=0):
# enums
_LE = LoggingConfigEnum()
_LP = LigandPreparationEnum()
_DE = DockingConfigurationEnum()
# 1) load and parse the input, whether from command-line or from the configuration
# note, that if "args.smiles" is not None, those smiles will be use irrespective of the configuration
lig_inp_parser = LigandInputParser(smiles=smiles,
ligand_number_start=ligand_number_start,
**pool)
list_ligands = lig_inp_parser.get_ligands()
if len(list_ligands) == 0:
raise LigandPreparationFailed("No smiles found in input.")
logger.log(f"Loaded {len(list_ligands)} molecules.", _LE.DEBUG)
# 2) do the embedding
if pool[_LP.TYPE] == _LP.TYPE_RDKIT:
prep = RDkitLigandPreparator(ligands=list_ligands, pool_number=pool_number, **pool)
elif pool[_LP.TYPE] == _LP.TYPE_OPENEYE:
prep = OpenEyeLigandPreparator(ligands=list_ligands, pool_number=pool_number, **pool)
elif pool[_LP.TYPE] == _LP.TYPE_CORINA:
prep = CorinaLigandPreparator(ligands=list_ligands, pool_number=pool_number, **pool)
elif pool[_LP.TYPE] == _LP.TYPE_LIGPREP:
prep = LigprepLigandPreparator(ligands=list_ligands, pool_number=pool_number, **pool)
elif pool[_LP.TYPE] == _LP.TYPE_OMEGA:
prep = OmegaLigandPreparator(ligands=list_ligands, pool_number=pool_number, **pool)
else:
raise LigandPreparationFailed("Type of pool is unknown.")
# generate 3D coordinates (embed), if not using SDF input
if _LP.INPUT_TYPE not in pool[_LP.INPUT].keys() or pool[_LP.INPUT][_LP.INPUT_TYPE].upper() != _LP.INPUT_TYPE_SDF:
prep.generate3Dcoordinates()
else:
logger.log(f"As input is SDF, coordinate generation is skipped.", _LE.INFO)
# 3) (optional) do alignment
if _LP.ALIGN in pool.keys():
prep.align_ligands()
# 4) (optional) write the molecules to the disk
if _LP.OUTPUT in pool.keys():
prep.write_ligands(path=pool[_LP.OUTPUT][_LP.OUTPUT_CONFORMERPATH],
format=pool[_LP.OUTPUT][_LP.OUTPUT_FORMAT])
# 5) save the ligands in the respective pool
# note, that a "pool" represents an embedded collection of molecules
return prep
import os
import logging.config as logging_config
from dockstream.loggers.interface_logger import InterfaceLogger
from dockstream.utils.files_paths import dict_from_json_file
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.general_utils import *
_LE = LoggingConfigEnum()
def initialize_logging(config, task, _task_enum, log_conf_path):
if in_keys(config, [task, _task_enum.HEADER]):
log_conf_dict = dict_from_json_file(log_conf_path)
if in_keys(config, [task, _task_enum.HEADER, _task_enum.LOGGING]):
if in_keys(config, [task, _task_enum.HEADER, _task_enum.LOGGING, _task_enum.LOGGING_LOGFILE]):
try:
log_conf_dict["handlers"]["file_handler"]["filename"] = config[task][_task_enum.HEADER][_task_enum.LOGGING][_task_enum.LOGGING_LOGFILE]
log_conf_dict["handlers"]["file_handler_blank"]["filename"] = config[task][_task_enum.HEADER][_task_enum.LOGGING][_task_enum.LOGGING_LOGFILE]
except KeyError:
pass
logging_config.dictConfig(log_conf_dict)
else:
logging_config.dictConfig(dict_from_json_file(log_conf_path))
logger = InterfaceLogger()
logger.log(f"DockStream version used: {parse_setuppy()['version']}", _LE.INFO)
return logger
def set_environment(config, task, _task_enum, logger):
if in_keys(config, [task, _task_enum.HEADER]):
if in_keys(config, [task, _task_enum.HEADER, _task_enum.ENVIRONMENT]):
if in_keys(config, [task, _task_enum.HEADER, _task_enum.ENVIRONMENT, _task_enum.ENVIRONMENT_EXPORT]):
exp_vars = config[task][_task_enum.HEADER][_task_enum.ENVIRONMENT][_task_enum.ENVIRONMENT_EXPORT]
for export in exp_vars:
os.environ[export[_task_enum.ENVIRONMENT_EXPORT_KEY]] = export[_task_enum.ENVIRONMENT_EXPORT_VALUE]
logger.log(f"Added environment variable {export[_task_enum.ENVIRONMENT_EXPORT_KEY]}: {export[_task_enum.ENVIRONMENT_EXPORT_VALUE]}.", _LE.DEBUG)
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.logging_enums import LoggingConfigEnum
from dockstream.utils.general_utils import *
def handle_poses_writeout(docking_run, docker, output_prefix):
_LE = LoggingConfigEnum()
_LP = LigandPreparationEnum()
_DE = DockingConfigurationEnum()
if in_keys(docking_run, [_DE.OUTPUT, _DE.OUTPUT_POSES]):
if in_keys(docking_run, [_DE.OUTPUT, _DE.OUTPUT_POSES, _DE.OUTPUT_POSES_PATH]):
poses_path = docking_run[_DE.OUTPUT][_DE.OUTPUT_POSES][_DE.OUTPUT_POSES_PATH]
poses_path = docker.apply_prefix_to_filename(poses_path, output_prefix)
# if the overwrite flag is set and the output file exists already, append number to basename
if nested_get(docking_run, [_DE.OUTPUT,
_DE.OUTPUT_POSES,
_DE.OUTPUT_POSES_OVERWRITE],
default=False):
poses_path = docker.update_path_to_unused(path=poses_path)
mode = nested_get(docking_run, [_DE.OUTPUT, _DE.OUTPUT_POSES, _DE.OUTPUT_MODE],
default=_DE.OUTPUT_MODE_ALL)
docker.write_docked_ligands(path=poses_path, mode=mode)
def handle_scores_writeout(docking_run, docker, output_prefix):
_LE = LoggingConfigEnum()
_LP = LigandPreparationEnum()
_DE = DockingConfigurationEnum()
if in_keys(docking_run, [_DE.OUTPUT, _DE.OUTPUT_SCORES]):
if in_keys(docking_run, [_DE.OUTPUT, _DE.OUTPUT_SCORES, _DE.OUTPUT_SCORES_PATH]):
scores_path = docking_run[_DE.OUTPUT][_DE.OUTPUT_SCORES][_DE.OUTPUT_SCORES_PATH]
scores_path = docker.apply_prefix_to_filename(scores_path, output_prefix)
# if the overwrite flag is set and the output file exists already, append number to basename
if nested_get(docking_run, [_DE.OUTPUT, _DE.OUTPUT_SCORES, _DE.OUTPUT_SCORES_OVERWRITE],
default=False):
scores_path = docker.update_path_to_unused(path=scores_path)
mode = nested_get(docking_run, [_DE.OUTPUT, _DE.OUTPUT_SCORES, _DE.OUTPUT_MODE],
default=_DE.OUTPUT_MODE_ALL)
docker.write_result(path=scores_path, mode=mode)
def handle_score_printing(print_scores: bool, print_all: bool, docker, logger):
_LE = LoggingConfigEnum()
_LP = LigandPreparationEnum()
_DE = DockingConfigurationEnum()
if print_scores:
_RK = ResultKeywordsEnum()
scores = docker.get_scores(best_only=not print_all)
for score in scores:
print(score, end="\n")
logger.log(f"Printed {len(scores)} scores to console (print_all set to {print_all}).", _LE.DEBUG)
\ No newline at end of file
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
class AutodockVinaDockingConfigurationEnum(DockingConfigurationEnum):
ADV_RECEPTOR_PTBQT_PATH = "receptor_pdbqt_path"
ADV_SEED = "seed"
ADV_SEARCH_SPACE = "search_space"
ADV_SEARCH_SPACE_CENTER_X = "--center_x"
ADV_SEARCH_SPACE_CENTER_Y = "--center_y"
ADV_SEARCH_SPACE_CENTER_Z = "--center_z"
ADV_SEARCH_SPACE_SIZE_X = "--size_x"
ADV_SEARCH_SPACE_SIZE_Y = "--size_y"
ADV_SEARCH_SPACE_SIZE_Z = "--size_z"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AutodockVinaExecutablesEnum:
# executable "vina" + parameters
# ---------
VINA = "vina"
VINA_CALL = "vina" # the binary call
VINA_HELP = "--help" # display usage summary
VINA_HELP_ADVANCED = "--help_advanced" # display usage summary (with all options)
VINA_VERSION = "--version" # diplay program version
VINA_VERSION_IDENTIFICATION_STRING = "AutoDock Vina 1.1.2" # string, which needs to be present in help output in
# order to assume "AutoDock Vina" can be properly used
VINA_CONFIGURATION = "--config" # path to configuration file, where options below can be put
# input
VINA_RECEPTOR = "--receptor" # rigid part of the receptor (PDBQT)
VINA_LIGAND = "--ligand" # ligand (PDBQT); only one at a time
VINA_FLEX = "--flex" # flexible side chains, if any (PDBQT)
# search space
VINA_CENTER_X = "--center_x" # X coordinate of the center
VINA_CENTER_Y = "--center_y" # Y coordinate of the center
VINA_CENTER_Z = "--center_z" # Z coordinate of the center
VINA_SIZE_X = "--size_x" # size in the X dimension (Angstroms)
VINA_SIZE_Y = "--size_y" # size in the X dimension (Angstroms)
VINA_SIZE_Z = "--size_z" # size in the X dimension (Angstroms)
# output
VINA_OUT = "--out" # output models (PDBQT), the default is chosen based on the
# ligand file name
VINA_LOG = "--log" # optionally, write log file
# advanced options
VINA_SCORE_ONLY = "--score_only" # score only - search space can be omitted
VINA_LOCAL_ONLY = "--local_only" # do local search only
VINA_RANDOMIZE_ONLY = "--randomize_only" # randomize input, attempting to avoid clashes
VINA_WEIGHT_GAUSS1 = "--weight_gauss1" # gauss_1 weight (default: -0.035579)
VINA_WEIGHT_GAUSS2 = "--weight_gauss2" # gauss_2 weight (default: -0.005156)
VINA_WEIGHT_REPULSION = "--weight_repulsion" # repulsion weight (default: 0.84024500000000002)
VINA_WEIGHT_HYDROPHOBIC = "--weight_hydrophobic" # hydrophobic weight (-0.035069000000000003)
VINA_WEIGHT_HYDROGEN = "--weight_hydrogen" # hydrogen bond weight (-0.58743900000000004)
VINA_WEIGHT_ROT = "--weight_rot" # N_rot weight (default: 0.058459999999999998)
# miscellaneous (optional)
VINA_CPU = "--cpu" # the number of CPUs to use (the default is to try to detect
# the number of CPUs or, failing that, use 1)
VINA_SEED = "--seed" # explicit random seed
VINA_EXHAUSTIVENESS = "--exhaustiveness" # exhaustiveness of the global search (roughly proportional
# to time): 1+ (default: 8)
VINA_NUM_MODES = "--num_modes" # maximum number of binding modes to generate (default: 9)
VINA_ENERGY_RANGE = "--energy_range" # maximum energy difference between the best binding mode and the
# worst one displayed [kcal/mol] (default: 3)
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AutodockVinaOutputEnum:
ADV_PDBQT = ".pdbqt"
# the score is part of a tag in the PDBQT -> SDF translated output (tag "REMARK"), which looks like that:
# < REMARK >
# VINA RESULT: -9.1 0.000 0.000
# Name = /tmp/tmpjssiy8z4.pdb
# ...
# Note, that the three values are: affinity [kcal/mol] | dist from best mode (rmsd l.b.) | rmsd (u. b.)
REMARK_TAG = "REMARK"
RESULT_LINE_IDENTIFIER = "VINA RESULT"
RESULT_LINE_POS_SCORE = 2
RESULT_LINE_POS_RMSDTOBEST_LB = 3
RESULT_LINE_POS_RMSDTOBEST_UB = 4
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AutodockTargetPreparationEnum(TargetPreparationEnum):
ADV_PDBQT = ".pdbqt"
RECEPTOR_PATH = "receptor_path"
PH = "pH"
EXTRACT_BOX = "extract_box"
EXTRACT_BOX_REFERENCE_LIGAND_PATH = "reference_ligand_path"
EXTRACT_BOX_REFERENCE_LIGAND_FORMAT = "reference_ligand_format"
EXTRACT_BOX_REFERENCE_LIGAND_FORMAT_PDB = "PDB"
EXTRACT_BOX_REFERENCE_LIGAND_FORMAT_SDF = "SDF"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AutodockResultKeywordsEnum(ResultKeywordsEnum):
"""This "Enum" serves to store all keywords for "AutoDock Vina" result strings."""
SDF_TAG_SCORE = "SCORE"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
class CorinaLigandPreparationEnum(LigandPreparationEnum):
OUTPUT_FORMAT_MAE = "MAE"
D_OPTIONS = "d_options"
ENUMERATE_STEREO = "enumerate_stereo"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class CorinaOutputEnum:
"""This "Enum" serves to store all keywords that are used by the "corina" executable."""
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class CorinaExecutablesEnum:
"""This "Enum" serves to store all the executables (and parameters) as strings available in the "Corina" backend."""
# executable "corina" + parameters
# ---------
CORINA = "corina"
CORINA_I = "-i"
CORINA_T_SMILES = "t=smiles"
CORINA_O = "-o"
CORINA_T_SDF = "t=sdf"
CORINA_D = "-d"
CORINA_HELP = "-h"
CORINA_HELP_IDENTIFICATION_STRING = "3D-structure generator" # if string found in "stderr" of result, "corina"
# is available
CORINA_T = "-t" # trace level (results in "corina.trc" file)
CORINA_T_DISABLED = "n"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
class GoldLigandPreparationEnum(LigandPreparationEnum):
REMOVE_UNKNOWN_ATOMS = "remove_unknown_atoms" # default: True; Whether or not to remove unknown atoms
ASSIGN_BOND_TYPES = "assign_bond_types" # default: True; Whether or not to assign bond types
STANDARDISE_BOND_TYPES = "standardise_bond_types" # default: False; Whether or not to standardise bonds to CSD conventions
ADD_HYDROGENS = "add_hydrogens" # default: True; Whether hydrogens need to be added
PROTONATE = "protonate" # default: True; Whether protonation rules need to be applied
PROTONATION_RULES_FILE = "protonation_rules_file" # default: None; Location of a file containing protonation rules
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldTargetPreparationEnum(TargetPreparationEnum):
OUTPUT_RECEPTORPATH = "receptor_path"
CAVITY_REFERENCE_DISTANCE = "distance"
CAVITY_METHOD_POINT = "point"
CAVITY_POINT_ORIGIN = "origin"
CAVITY_POINT_DISTANCE = "distance"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldExecutablesEnum:
"""This "Enum" serves to store all the executables (and parameters) as strings available in the "Gold" module."""
GOLD_AUTO = "gold_auto"
GOLD_AUTO_HELP = "-h"
GOLD_AUTO_HELP_IDENTIFICATION_STRING = "Usage: gold_auto"
GOLD_AUTO_CONFIG_NAME = "gold_auto.config"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldTargetKeywordEnum(GoldTargetPreparationEnum):
VERSION = "version"
CURRENT_VERSION = 1.0
TARGET_PDB = "target_pdb"
TARGET_PDB_FILENAME = "target_pdb_filename"
REFERENCE_LIGAND = "reference_ligand"
REFERENCE_LIGAND_FILENAME = "reference_ligand_filename"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldDockingConfigurationEnum(DockingConfigurationEnum):
RECEPTOR_PATHS = "receptor_paths"
FITNESS_FUNCTION = "fitness_function"
FITNESS_FUNCTION_GOLDSCORE = "goldscore"
FITNESS_FUNCTION_CHEMSCORE = "chemscore"
FITNESS_FUNCTION_ASP = "asp"
FITNESS_FUNCTION_PLP = "plp"
EARLY_TERMINATION = "early_termination"
DIVERSE_SOLUTIONS = "diverse_solutions"
NDOCKS = "ndocks" # number of docking attempts per ligand
AUTOSCALE = "autoscale" # "very fast": 10
# "fast": 25
# "medium": 50
# "slow": 75
# "very slow": 100
GOLD_RESPONSE_VALUE = "response_value" # set to either "value" or "fitness" to use either as response
GOLD_RESPONSE_VALUE_FITNESS = "fitness"
GOLD_RESPONSE_VALUE_VALUE = "value" # the "value" (and whether positive or negative values are "better")
# depend on the fitness function chosen
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldResultKeywordsEnum(ResultKeywordsEnum):
"""This "Enum" serves to store all keywords for "Gold" result dictionaries."""
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class GoldOutputEnum:
"""This "Enum" serves to store all keywords that are used by the "Gold" module."""
TAG = "tag"
BEST = "best"
DICT_FITNESS = {"chemscore": {TAG: "Gold.Chemscore.Fitness", BEST: "max"},
"plp": {TAG: "Gold.PLP.Fitness", BEST: "max"},
"goldscore": {TAG: "Gold.Goldscore.Fitness", BEST: "max"},
"asp": {TAG: "Gold.ASP.Fitness", BEST: "max"}}
DICT_VALUE = {"asp": {TAG: "Gold.ASP.ASP", BEST: "max"},
"chemscore": {TAG: "Gold.Chemscore.DG", BEST: "min"},
"plp": {TAG: "Gold.PLP.PLP", BEST: "min"},
"goldscore": None}
# The following values are "the-higher-the-better":
# all fitness scores, Gold.ASP.ASP
# The following values are "the-lower-the-better":
# Gold.Chemscore.DG, Gold.PLP.PLP
# For "goldscore", there is no "value" version.
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
class OpenEyeHybridLigandPreparationEnum(LigandPreparationEnum):
# align using OpenEye's template version, which is set at the receptor building stage
ALIGN_MODE_OPENEYERECEPTOR = "OpenEye_receptor"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeHybridTargetPreparationEnum(TargetPreparationEnum):
OUTPUT_RECEPTORPATH = "receptor_path"
CAVITY_METHOD_BOX = "box"
CAVITY_BOX_LIMITS = "limits"
CAVITY_METHOD_HINT = "hint"
CAVITY_HINT_COORDINATES = "coordinates"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeHybridExecutablesEnum:
"""This "Enum" serves to store all the executables (and parameters) as strings available in the "OpenEye Hybrid" module"""
OE_HYBRID_MODULE_LOAD = "module load oedocking"
HYBRID = "hybrid"
OE_HYBRID_HELP_SIMPLE = "--help simple" # returns simple list of parameters
OE_HYBRID_HELP_ALL = "--help all" # returns complete list of parameters
OE_HYBRID_HELP_DEFAULTS = "--help defaults" # returns the default values for all parameters
OE_HYBRID_HELP_HTML = "--help html" # creates an html help file for OE Hybrid
OE_HYBRID_HELP_VERSIONS = "--help versions" # lists toolkits and versions used for OE Hybrid
OE_HYBRID_HELP_IDENTIFICATION_STRING = "To cite HYBRID" # string to identify whether OE Hybrid is available for a docking job
# required parameters
# -------------------
RECEPTOR = "-receptor" # required: receptor file for docking. Must contain bound ligand
DBASE = "-dbase" # required: database; ligands to dock. Ligand must be in 3D format
# optional parameters
# -------------------
PARAM = "-param" # text file containing parameters for docking job
MOLNAMES = "-molnames" # text file containing molecule names corresponding to ligands in the dbase parameter. Only ligands with matched names will be docked
DOCK_RESOLUTION = "-dock_resolution" # controls docking resolution, default: standard
DOCKED_MOLECULE_FILE = "-docked_molecule_file" # file to write the docked molecules to, default: docked.oeb.gz
UNDOCKED_MOLECULES_FILE = "-undocked_molecule_file" # file to write the unsuccessfully docked molecules to, default: undocked.oeb.gz
SCORE_FILE = "-score_file" # file to write the docked molecule names and scores to, default: score.txt
REPORT_FILE = "-report_file" # file to write the text report of the docking run to, default: report.txt
SETTINGS_FILE = "-settings_file" # file to write the settings used for the docking run, default: settings.param
STATUS_FILE = "-status_file" # file to write the status of the docking run which is updated and overwritten every few seconds, default: status.txt
HITLIST_SIZE = "-hitlist_size" # parameter controls the number of top scoring molecules to be outputted. Excess will be discarded.
# "0" denotes "serial mode" where all molecules will be outputted unsorted, default: 500
NUM_POSES = "-num_poses" # parameter specifies number of docked poses to output for each molecules
SCORE_TAG = "-score_tag" # parameter specifies tag to use when storing molecule scores, default: HYBRID Chemgauss4 Score
ANNOTATE_SCORES = "-annotate_scores" # parameter specifies whether to add VIDA (OpenEye's molecular visualization program) score annotations to processed molecules, defautlt: false
SAVE_COMPONENT_SCORES = "-save_component_scores" # parameter specifies whether individual components of the total score for each pose is saved to the score file
NO_EXTRA_OUTPUT_FILES = "-no_extra_output_files" # parameter controls output files from docking run. if "true", the output is only the docked molecule file, default: false
NO_DOTS = "-no_dots" # parameter specifies whether a dot/"x" is written for standard error/failed docking molecules, default: false
PREFIX = "-prefix" # parameter specifies prefix to use for all output files, default: hybrid
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeHybridDockingConfigurationEnum(DockingConfigurationEnum):
RECEPTOR_PATHS = "receptor_paths"
NUMBER_POSES = "number_poses"
# scoring functions
# ---------
SCORING = "scoring"
SCORING_INVALID_VALUE = 16777215
# McGann2003: shape-based scoring function that favours poses that complement the active site well, ignoring any
# chemical interactions; good choice to ensure shape-complementarity
SCORING_SHAPEGAUSS = "Shapegauss"
# Verkhivker2000: Piecewise Linear Potential uses both shape and hydrogen bond complementarity; in the implementation
# used, it also includes metal-based interactions
SCORING_PLP = "PLP"
# Eldridge1997: includes lipophilic, H-bonds, metals, clashes, rotatable bonds
SCORING_CHEMSCORE = "Chemscore"
# the Chemgauss-scoring functions use Gaussian smoothed potentials to measure complementarity; includes shape,
# H-bonds between ligand and protein, H-bonds with implicit solvent and metal interactions; version 4 is an
# improvement in terms of H-bonding
SCORING_CHEMGAUSS3 = "Chemgauss3"
SCORING_CHEMGAUSS4 = "Chemgauss4"
SCORING_HYBRID1 = "Hybrid1"
SCORING_HYBRID2 = "Hybrid2"
# resolution (specifies search resolution during exhaustive search and local optimization as well as the number
# of poses passed from the exhaustive step to the optimization step
# ---------
RESOLUTION = "resolution"
RESOLUTION_INVALID_VALUE = 16777215
RESOLUTION_HIGH = "High" # 1000 poses passed
RESOLUTION_STANDARD = "Standard" # 100 poses passed
RESOLUTION_LOW = "Low" # 100 poses passed
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeHybridResultKeywordsEnum(ResultKeywordsEnum):
"""This "Enum" serves to store all keywords for "OpenEye Hybrid" result dictionaries"""
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeHybridOutputKeywordsEnum:
"""This "Enum" serves to store all "OpenEye Hybrid" output enums"""
# there are 6 default output files by default and in some cases (e.g. docked poses), the file
# extension provided denotes the output format
SCORE = "HYBRID Chemgauss4 score"
DOCKED_MOLECULES_SDF_OUTPUT = "docked_molecules.sdf" # specifying sdf here removes need for OpenBabel conversion of oeb to sdf
UNDOCKED_MOLECULES_SDF_OUTPUT = "undocked_molecules.sdf" # specifying sdf here removes need for OpenBabel conversion of oeb to sdf
SCORE_FILE_OUTPUT = "score.txt"
REPORT_FILE_OUTPUT = "report.txt"
SETTINGS_FILE_OUTPUT = "settings.param"
STATUS_FILE_OUTPUT = "status.txt"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenBabelOutputEnum:
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenBabelExecutablesEnum:
# executable "obabel" + parameters
# ---------
OBABEL = "obabel"
OBABEL_IDENTIFICATION_STRING = "-O<outfilename>"
OBABLE_INPUTFORMAT_PDBQT = "-ipdbqt" # sets the input format to "PDBQT" (output of "AutoDock Vina")
OBABEL_P = "-p" # sets the <pH> value (e.g. "-p 7.4") for protonation
# note, that this overwrites "--addpolarh", which is thus not used
OBABEL_O = "-O" # specifies the output path (directly pasted afterwards, e.g. "-Omypath.pdb")
OBABEL_OUTPUT_FORMAT_PDBQT = "-opdbqt" # sets the output format to "PDBQT" (input for "AutoDock Vina")
OBABEL_OUTPUT_FORMAT_SDF = "-osdf" # sets the output format to "SDF"
OBABEL_X = "-x" # specifies generation options
OBABEL_X_R = 'r' # one of the 'X' options ("-x"), which disables the tree construction of the receptor (makes it static), directly pasted together: e.g. "-xr"
OBABEL_PARTIALCHARGE = "--partialcharge" # sets the partial charge generation method (execute "obabel -L charges" to see list of available methods)
OBABEL_PARTIALCHARGE_GASTEIGER = "gasteiger" # one method to compute the partial charges, used as: "--partialcharge gasteiger"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
class OpenEyeLigandPreparationEnum(LigandPreparationEnum):
# align using OpenEye's template version, which is set at the receptor building stage
ALIGN_MODE_OPENEYERECEPTOR = "OpenEye_receptor"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeTargetPreparationEnum(TargetPreparationEnum):
OUTPUT_RECEPTORPATH = "receptor_path"
CAVITY_METHOD_BOX = "box"
CAVITY_BOX_LIMITS = "limits"
CAVITY_METHOD_HINT = "hint"
CAVITY_HINT_COORDINATES = "coordinates"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeDockingConfigurationEnum(DockingConfigurationEnum):
RECEPTOR_PATHS = "receptor_paths"
# scoring functions
# ---------
SCORING = "scoring"
SCORING_INVALID_VALUE = 16777215
# McGann2003: shape-based scoring function that favours poses that complement the active site well, ignoring any
# chemical interactions; good choice to ensure shape-complementarity
SCORING_SHAPEGAUSS = "Shapegauss"
# Verkhivker2002: Piecewise Linear Potential uses both shape and hydrogen bond complementarity; in the implementation
# used, it also includes metal-based interactions
SCORING_PLP = "PLP"
# Eldridge1997: includes lipophilic, H-bonds, metals, clashes, rotatable bonds
SCORING_CHEMSCORE = "Chemscore"
# the Chemgauss-scoring functions use Gaussian smoothed potentials to measure complementarity; includes shape,
# H-bonds between ligand and protein, H-bonds with implicit solvent and metal interactions; version 4 is an
# improvement in terms of H-bonding
SCORING_CHEMGAUSS3 = "Chemgauss3"
SCORING_CHEMGAUSS4 = "Chemgauss4"
SCORING_HYBRID1 = "Hybrid1"
SCORING_HYBRID2 = "Hybrid2"
# resolution (specifies search resolution during exhaustive search and local optimization as well as the number
# of poses passed from the exhaustive step to the optimization step
# ---------
RESOLUTION = "resolution"
RESOLUTION_INVALID_VALUE = 16777215
RESOLUTION_HIGH = "High" # 1000 poses passed
RESOLUTION_STANDARD = "Standard" # 100 poses passed
RESOLUTION_LOW = "Low" # 100 poses passed
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class OpenEyeResultKeywordsEnum(ResultKeywordsEnum):
"""This "Enum" serves to store all keywords for "OpenEye" result dictionaries."""
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.ligand_preparation_enum import LigandPreparationEnum
class RDkitLigandPreparationEnum(LigandPreparationEnum):
TAG_RDOCK_TETHERED_ATOMS = "TETHERED ATOMS"
# 3D coordinates generation
# ---------
EP_PARAMS_COORDGEN = "coordinate_generation"
EP_PARAMS_COORDGEN_METHOD = "method"
EP_PARAMS_COORDGEN_UFF = "UFF"
EP_PARAMS_COORDGEN_UFF_MAXITERS = "maximum_iterations"
# tie molecules to a reference during docking
ALIGN_TETHERING = "tethering"
# keywords for molecule tags
# ---------
TAG_ALIGNED_ATOMS = "ALIGNED ATOMS"
TAG_ALIGNED_RMSD = "ALIGNED RMSD"
TAG_ALIGNED_REFERENCE = "ALIGNED REFERENCE"
TAG_ALIGNED_ATOMS_RATIO = "ALIGNED ATOMS RATIO"
PROTONATE = "protonate"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
\ No newline at end of file
class AnalysisConfigurationEnum:
ANALYSIS = "analysis"
# RMSD
# ---------
RMSD = "rmsd"
RMSD_DATA = "data"
RMSD_DATA_MOLECULES_PATH = "molecules_path"
RMSD_DATA_NAME = "name"
RMSD_OUTPUT = "output"
RMSD_OUTPUT_SUMMARY_PATH = "summary_path"
RMSD_OUTPUT_DETAILS_PATH = "details_path"
RMSD_OUTPUT_HEATMAP_PATH = "heatmap_path"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AnalysisInternalEnum:
DATA_NAME = "name"
MOLECULES = "molecules"
FIRST_SET = "first_set"
SECOND_SET = "second_set"
LIST_RMSD_VALUES = "list_rmsd_values"
MEAN_RMSD_VALUES = "mean_rmsd_values"
SD_RMSD_VALUES = "sd_rmsd_values"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class AnalysisEnum:
"""this "Enum" defines all the strings required in the analysis script."""
# Input Docking Data
# --------------------------------------
INPUT_DOCKING_DATA = "input_docking_data"
DATA_PATH = "data_path"
LIGAND_NUMBER = "ligand_number"
DATA_METRIC = "data_metric"
MAX_DATA_METRIC_BEST = "max_data_metric_best"
DATA_THRESHOLDS = "data_thresholds"
# Input Experimental Data
# --------------------------------------
INPUT_EXP_DATA = "input_exp_data"
EXP_DATA_PATH = "exp_data_path"
EXP_METRIC = "exp_metric"
COMPARISON_SCORE = "comparison_score"
MAX_EXP_METRIC_BEST = "max_exp_metric_best"
EXP_THRESHOLDS = "exp_thresholds"
# Input Binary Actives/Inactives Data
# ---------------------------------------
INPUT_ENRICHMENT_DATA = "input_enrichment_data"
DATA_PATH_ACTIVES = "data_path_actives"
DATA_PATH_INACTIVES = "data_path_inactives"
ACTIVES_DATA_METRIC = "actives_data_metric"
INACTIVES_DATA_METRIC = "inactives_data_metric"
MAX_METRIC_BEST = "max_metric_best"
ACTIVES = "Actives"
INACTIVES = "Inactives"
# Plots
# ---------------------------------------
PLOT_SETTINGS = "plot_settings"
ENRICHMENT_ANALYSIS = "enrichment_analysis"
PROC_OVERLAY = "pROC_overlay"
# Output Folder
# ---------------------------------------
OUTPUT = "output"
OUTPUT_PATH = "output_path"
# Histogram Plot Parameters
# ---------------------------------------
HIST_TWO_COLOURS = ['green', 'blue']
HIST_THREE_COLOURS = ['green', 'orange', 'blue']
HIST_FOUR_COLOURS = ['green', 'orange', 'blue', 'cyan']
class DockingConfigurationEnum:
"""This "Enum" serves to store all the strings used in parsing "DockStream" configurations."""
DOCKING = "docking"
# header region
# ---------
HEADER = "header"
ENVIRONMENT = "environment"
ENVIRONMENT_EXPORT = "export"
ENVIRONMENT_EXPORT_KEY = "key"
ENVIRONMENT_EXPORT_VALUE = "value"
LOGGING = "logging"
LOGGING_VERBOSITY = "verbosity"
LOGGING_LOGFILE = "logfile"
DOCKING_RUNS = "docking_runs"
RUN_ID = "run_id"
INPUT_POOLS = "input_pools"
PARAMS = "parameters"
PARAMS_PREFIX_EXECUTION = "prefix_execution"
PARAMS_BINARY_LOCATION = "binary_location"
# parallelization
# ---------
PARALLELIZATION = "parallelization"
PARALLELIZATION_NUMBER_CORES = "number_cores"
PARALLELIZATION_MAXCOMPOUNDSPERSUBJOB = "max_compounds_per_subjob"
# the different backend types
# ---------
BACKEND = "backend"
BACKEND_RDOCK = "rDock"
BACKEND_OPENEYE = "OpenEye"
BACKEND_OPENEYEHYBRID = "Hybrid"
BACKEND_AUTODOCKVINA = "AutoDockVina"
BACKEND_GOLD = "Gold"
BACKEND_GLIDE = "Glide"
# structural alignment to reference
# ---------
ALIGN = "align"
ALIGN_REFERENCE_PATHS = "reference_paths"
ALIGN_MINIMUM_SUBSTRUCTURE_RATIO = "minimum_substructure_ratio"
ALIGN_COMPLETE_RINGS_ONLY = "complete_rings_only"
# what to do in cases, where no alignment can be made, define what to do
ALIGN_FAIL_ACTION = "fail_action"
ALIGN_FAIL_DISCARD = "discard"
ALIGN_FAIL_KEEP = "keep"
# output
OUTPUT = "output"
OUTPUT_POSES = "poses"
OUTPUT_POSES_OVERWRITE = "overwrite"
OUTPUT_POSES_PATH = "poses_path"
OUTPUT_SCORES = "scores"
OUTPUT_SCORES_OVERWRITE = "overwrite"
OUTPUT_SCORES_PATH = "scores_path"
OUTPUT_MODE = "mode"
OUTPUT_MODE_ALL = "all"
OUTPUT_MODE_BESTPERLIGAND = "best_per_ligand"
OUTPUT_MODE_BESTPERENUMERATION = "best_per_enumeration"
# number poses returned from docking (top X poses)
NUMBER_POSES = "number_poses"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class ResultKeywordsEnum:
"""This "Enum" serves to store all keywords for result dictionaries."""
# ResultParser::get_result() result dataframe
# ---------
DF_LIGAND_NAME = "name"
DF_LIGAND_NAME_MOLECULE = ""
DF_LIGAND_NAME_CONFORMER = ""
DF_LIGAND_NUMBER = "ligand_number"
DF_LIGAND_ENUMERATION = "enumeration"
DF_CONFORMER = "conformer_number"
DF_SCORE = "score"
DF_SMILES = "smiles"
DF_LOWEST_CONFORMER = "lowest_conformer"
AGGREGATE_BEST = "best"
AGGREGATE_AVERAGE = "average"
# fixed values
# ---------
FIXED_VALUE_NA = "NA"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class LigandPreparationEnum:
"""This "Enum" serves to store all the strings used in parsing "DockStream" configurations concerning ligands."""
LIGAND_PREPARATION = "ligand_preparation"
# the embedding (3D coordinate generation)
EMBEDDING_POOLS = "embedding_pools"
POOLID = "pool_id"
MOLECULES = "molecules"
PARAMS = "parameters"
# input parameters
INPUT = "input"
INPUT_STANDARDIZE_SMILES = "standardize_smiles"
INPUT_PATH = "input_path"
INPUT_TYPE = "type"
INPUT_TYPE_CONSOLE = "CONSOLE"
INPUT_TYPE_LIST = "LIST"
INPUT_TYPE_SMI = "SMI"
INPUT_TYPE_CSV = "CSV"
INPUT_TYPE_SDF = "SDF"
PREFIX_EXECUTION = "prefix_execution"
BINARY_LOCATION = "binary_location"
INITIALIZATION_MODE = "initialization_mode"
INITIALIZATION_MODE_ORDER = "order"
INITIALIZATION_MODE_AZDOCK = "dockstream"
# CSV input
INPUT_CSV_DELIMITER = "delimiter"
INPUT_CSV_DELIMITER_DEFAULT = ','
INPUT_CSV_COLUMNS = "columns"
INPUT_CSV_COLNAME_SMILES = "smiles"
INPUT_CSV_COLNAME_NAMES = "names"
# SDF input
INPUT_SDF_TAGS = "tags"
INPUT_SDF_TAGNAME_NAMES = "names"
# output parameters
OUTPUT = "output"
OUTPUT_CONFORMERPATH = "conformer_path"
OUTPUT_FORMAT = "format"
OUTPUT_FORMAT_SDF = "SDF"
OUTPUT_FORMAT_MOL2 = "MOL2"
# TautEnum can be used to prepare the smiles (input)
# ---------
USE_TAUT_ENUM = "use_taut_enum"
TAUT_ENUM_PREFIX_EXECUTION = "prefix_execution"
TAUT_ENUM_ENUMERATE_PROTONATION = "enumerate_protonation"
TAUT_ENUM_BINARY_LOCATION = "binary_location"
# the different types of embedding
TYPE = "type"
TYPE_RDKIT = "RDkit"
TYPE_OPENEYE = "OpenEye"
TYPE_CORINA = "Corina"
TYPE_LIGPREP = "Ligprep"
TYPE_GOLD = "Gold"
TYPE_OMEGA = "Omega"
# structural alignment to reference ("internal" method)
# ---------
ALIGN = "align"
ALIGN_MODE = "mode"
ALIGN_MODE_INTERNAL = "internal"
ALIGN_REFERENCE_PATHS = "reference_paths"
ALIGN_REFERENCE_FORMAT = "reference_format"
ALIGN_REFERENCE_FORMAT_SDF = "SDF"
ALIGN_REFERENCE_FORMAT_PDB = "PDB"
ALIGN_MINIMUM_SUBSTRUCTURE_RATIO = "minimum_substructure_ratio"
ALIGN_COMPLETE_RINGS_ONLY = "complete_rings_only"
# what to do in cases, where no alignment can be made, define what to do
ALIGN_FAIL_ACTION = "fail_action"
ALIGN_FAIL_DISCARD = "discard"
ALIGN_FAIL_KEEP = "keep"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class LoggingConfigEnum:
"""This "Enum" serves to store all paths and keywords used to configure the loggers."""
# set levels (for now, they match to the "logging" default ones)
DEBUG = "debug"
INFO = "info"
WARNING = "warning"
ERROR = "error"
EXCEPTION = "exception"
# paths to the configuration JSONs that are shipped with DockStream
PATH_CONFIG_DEFAULT = "dockstream/config/logging/default.json"
PATH_CONFIG_VERBOSE = "dockstream/config/logging/verbose.json"
PATH_CONFIG_DEBUG = "dockstream/config/logging/debug.json"
# high-level loggers defined in the configurations
LOGGER_INTERFACE = "command_line_interface"
LOGGER_TARGET_PREPARATION = "target_preparation"
LOGGER_LIGAND_PREPARATION = "ligand_preparation"
LOGGER_DOCKING = "docking"
LOGGER_BLANK = "blank"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
from dockstream.utils.enums.target_preparation_enum import TargetPreparationEnum
from dockstream.utils.enums.docking_enum import DockingConfigurationEnum, ResultKeywordsEnum
class rDockTargetPreparationEnum(TargetPreparationEnum):
CAVITY_PRMFILE = "prm_file"
CAVITY_METHOD_TWOSPHERES = "two_spheres"
# strings that are replace in the PRM file
STRING_RECEPTOR_MOL2_PATH = "<RECEPTOR_MOL2_FILE_ABSOLUTE_PATH>"
STRING_REFERENCE_LIGAND_SDF_PATH = "<REFERENCE_SDF_FILE_ABSOLUTE_PATH>"
RUNS_OUTPUT_DIRECTORY = "directory"
PRM_DEFAULT_PATH = "dockstream/config/rDock/standard_reference_ligand.prm"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class rDockDockingConfigurationEnum(DockingConfigurationEnum):
PARAMS_PRM_PATHS = "rbdock_prm_paths"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class rDockExecutablesEnum:
"""This "Enum" serves to store all the executables (and parameters) as strings available in the "rDock" module."""
# environment variables
# ---------
RBT_ROOT = "RBT_ROOT"
RBT_HOME = "RBT_HOME"
# executable "rbdock" + parameters
# ---------
RBDOCK = "rbdock"
RBDOCK_HELP = "--help"
RBDOCK_HELP_IDENTIFICATION_STRING = "Usage: rbdock"
RBDOCK_S = "-s" # random seed
RBDOCK_S_DEFAULT = 42 # default for random seed
RBDOCK_T = "-T" # trace level for debugging
RBDOCK_N = "-n" # number of runs
RBDOCK_R = "-r" # receptor PRM file
RBDOCK_O = "-o" # output file
RBDOCK_P = "-p" # protocol file
RBDOCK_P_DEFAULT = "dock.prm" # default docking protocol, located in $RBT_ROOT/data/scripts
RBDOCK_I = "-i" # input file
# executable "rbcavity" + parameters
# ---------
RBCAVITY = "rbcavity" # generate a cavity from a receptor file
RBCAVITY_R = "-r" # specifies path to input configuration PRM file
RBCAVITY_D = "-d" # dump the grid in pymol
RBCAVITY_WAS = "-was" # write the cavity out
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class rDockRbdockOutputEnum:
"""This "Enum" serves to store all keywords that are used by the "rbdock" executable."""
# key values
# ---------
NAME = "Name"
SCORE = "SCORE"
# remaining values (for reference purposes only)
# ---------
CHROM_ZERO = "CHROM.0"
CHROM_ONE = "CHROM.1"
RI = "RI"
RBT_CURRENT_DIRECTORY = "Rbt.Current_Directory"
RBT_EXECUTABLE = "Rbt.Executable"
RBT_LIBRARY = "Rbt.Library"
RBT_PARAMETER_FILE = "Rbt.Parameter_File"
RBT_RECEPTOR = "Rbt.Receptor"
SCORE_INTER = "SCORE.INTER"
SCORE_INTER_CONST = "SCORE.INTER.CONST"
SCORE_INTER_POLAR = "SCORE.INTER.POLAR"
SCORE_INTER_REPUL = "SCORE.INTER.REPUL"
SCORE_INTER_ROT = "SCORE.INTER.ROT"
SCORE_INTER_VDW = "SCORE.INTER.VDW"
SCORE_INTER_NORM = "SCORE.INTER.norm"
SCORE_INTRA = "SCORE.INTRA"
SCORE_INTRA_DIHEDRAL = "SCORE.INTRA.DIHEDRAL"
SCORE_INTRA_DIHEDRAL_ZERO = "SCORE.INTRA.DIHEDRAL.0"
SCORE_INTRA_POLAR = "SCORE.INTRA.POLAR"
SCORE_INTRA_POLAR_ZERO = "SCORE.INTRA.POLAR.0"
SCORE_INTRA_REPUL = "SCORE.INTRA.REPUL"
SCORE_INTRA_REPUL_ZERO = "SCORE.INTRA.REPUL.0"
SCORE_INTRA_VDW = "SCORE.INTRA.VDW"
SCORE_INTRA_VDW_ZERO = "SCORE.INTRA.VDW.0"
SCORE_INTRA_NORM = "SCORE.INTRA.norm"
SCORE_RESTR = "SCORE.RESTR"
SCORE_RESTR_CAVITY = "SCORE.RESTR.CAVITY"
SCORE_RESTR_NORM = "SCORE.RESTR.norm"
SCORE_SYSTEM = "SCORE.SYSTEM"
SCORE_SYSTEM_CONST = "SCORE.SYSTEM.CONST"
SCORE_SYSTEM_DIHEDRAL = "SCORE.SYSTEM.DIHEDRAL"
SCORE_SYSTEM_POLAR = "SCORE.SYSTEM.POLAR"
SCORE_SYSTEM_REPUL = "SCORE.SYSTEM.REPUL"
SCORE_SYSTEM_VDW = "SCORE.SYSTEM.VDW"
SCORE_SYSTEM_NORM = "SCORE.SYSTEM.norm"
SCORE_HEAVY = "SCORE.heavy"
SCORE_NORM = "SCORE.norm"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class rDockRbcavityOutputEnum:
"""This "Enum" serves to store all keywords that are used by the "rbcavity" executable."""
DOCKING_SITE = "DOCKING SITE"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class rDockResultKeywordsEnum(ResultKeywordsEnum):
"""This "Enum" serves to store all keywords for "rDock" result dictionaries."""
# rDockTargetPreparator::specify_cavity() result dictionary
# ---------
SPECIFYCAVITY_BINARY_PATH = "binary_path"
SPECIFYCAVITY_GRID_PATH = "grid_path"
SPECIFYCAVITY_METADATA = "cavity_metadata"
SPECIFYCAVITY_METADATA_TOTALVOLUME = "total_volume"
SPECIFYCAVITY_METADATA_SIZEINPOINTS = "size_in_points"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class StereoEnumerationEnum:
# stereo-enumeration
# ---------
STEREO_ENUM = "stereo_enumeration"
STEREO_ENUM_BACKEND = "stereo_backend"
STEREO_ENUM_BACKEND_RDKIT = "RDkit"
STEREO_ENUM_PARAMETERS = "parameters"
# RDkit
# ---------
RDKIT_TRY_EMBEDDING = "try_embedding"
RDKIT_UNIQUE = "unique"
RDKIT_MAX_ISOMERS = "max_isomers"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
class TagAdditionsEnum:
TAG_NAME = "name"
TAG_SMILES = "smiles"
TAG_ORIGINAL_SMILES = "original_smiles"
TAG_LIGAND_ID = "ligand_id"
# try to find the internal value and return
def __getattr__(self, name):
if name in self:
return name
raise AttributeError
# prohibit any attempt to set any values
def __setattr__(self, key, value):
raise ValueError("No changes allowed.")
\ No newline at end of file
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This diff is collapsed. Click to expand it.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed. Click to expand it.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment