Skip to content

Commit

Permalink
include utility function for drug/target embedding only
Browse files Browse the repository at this point in the history
  • Loading branch information
kexinhuang12345 committed Oct 28, 2020
1 parent 4ee813d commit 6b0fa02
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 0 deletions.
28 changes: 28 additions & 0 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Instructions on how to include a new encoder

Thank you for your interest in DeepPurpose! As more and more models are coming up, we want to include as much as the models and their pretrained models in our framework. Here we provide step-by-step instructions to do that:


### Step 1: modify the ``utils.py`` file for data and parameter.

For any dataset, we expect each drug is associated with SMILES and each protein with amino acid sequence. However, as different encoders expect different input to the model (e.g., MPNN expects mol graph), we need to first transform it to the expected format. To do that, in the ``utils.py`` file, define a new function ``smiles2xxx`` or ``target2xxx`` which taks a input SMILES/sequence and outputs the encoding format for that single input.

Then, in the ``encode_drug`` or ``encode_protein`` functions, include a ``elif`` statement to transform all of the data points in the input dataframe using just defined ``smiles2xxx`` or ``target2xxx``.

For special input formats such as further transformation on the fly, please add a ``elif`` statement to the ``data_process_loader``, ``data_process_DDI_loader``, ``data_process_PPI_loader``, ``data_process_loader_Protein_Prediction``, ``data_process_loader_Protein_Prediction``. You can refer to the examples for CNN in these functions.

Now, in the ``generate_config`` file, add an ``elif`` statement to include all important encoder parameters (e.g. input dimension, model dim and etc.). If your encoder has new parameters that you want the users to specify in the ``model_initialize`` function, you should also add in the function parameter space. If so, please specify the default values.

### Step 2: modify the ``encoders.py`` for model definition

In the ```encoders.py```, define the encoder models. The input of the ``__init__`` in default should contain ``encoding``, which is either 'drug' or 'protein', and ``**config``, which includes all the model parameters defined by users. For the ``forward`` function, we expect to input one feature matrix and output the hidden embedding.

### Step 3: modify the training scripts ``DTI.py, DDI.py, PPI.py, CompoundPred.py, ProteinPred.py``

Finally, we need to modify the training wrappers. Every file has similar structures so we will talk about one file and the rest should follow. In the main class ``__init__`` function, include an ``elif`` statement to define the model based on the definitions in ``encoders.py``.

That's it! You have successfully included your model in DeepPurpose!

### Test and Write in README file

Before you create a pull request, please also test it locally and send [email protected] a test case. Then, you are good to go!
9 changes: 9 additions & 0 deletions DeepPurpose/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -915,6 +915,15 @@ def load_dict(path):
'HIV': 'https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/molnet_publish/hiv.zip'
}

def obtain_compound_embedding(net, file, file_type = 'df'):
if file_type == 'df':
x = np.stack(file['drug_encoding'].values)
elif file_type == 'array':
x = file
else:
raise AttributeError

return net.model_drug(torch.FloatTensor(x))

def download_unzip(name, path, file_name):
if not os.path.exists(path):
Expand Down

0 comments on commit 6b0fa02

Please sign in to comment.