ProteinMPNN_dataset #9810

jdhenaos · 2024-11-27T17:06:00Z

A new Dataset for ProteinMPNN model was added.

The Dataset was evaluated using the next piece of code:

from torch_geometric.loader import DataLoader

path = "./sample_data"
params = {
    "LIST"    : f"{path}/pdb_2021aug02_sample/list.csv",
    "VAL"     : f"{path}/pdb_2021aug02_sample/valid_clusters.txt",
    "DIR"     : f"{path}/pdb_2021aug02_sample",
    "DATCUT"  : "2030-Jan-01",
    "RESCUT"  : 3.5, #resolution cutoff for PDBs
    "HOMO"    : 0.70 #min seq.id. to detect homo chains
}

train_dataset = PMPNNDataset(root=path,params=params,set_type='train')
validation_dataset = PMPNNDataset(root=path,params=params,set_type='val')

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,
                          num_workers=2)
validation_loader = DataLoader(validation_dataset, batch_size=32, shuffle=True,
                          num_workers=2)

for more information, see https://pre-commit.ci

xnuohz

Left some comments. Plz add a dataset test and make CI pass:)

xnuohz · 2024-11-28T03:15:53Z

torch_geometric/datasets/proteinmpnn_dataset.py

+    Args:
+      root (str): Root directory where the dataset should be saved.
+      params (dict): Dictionary of parameters for dataset creation:
+                     LIST: Path to the table with metadata.
+                     VAL: Path to list of cluster IDs for model validation.
+                     DIR: Path to dataset.
+                     DATCUT: Date (YYY-MM-DD) threshold of sequence deposition.
+                     RESCUT: Resolution cutoff for PDBs.
+                     HOMO: Minimal sequence identity to detect homodimeric chains.
+      set_type (str): Type of expected data, train ("train") or validation ("val")


Args are not aligned with __init__ parameters

xnuohz · 2024-11-28T03:18:03Z

torch_geometric/datasets/proteinmpnn_dataset.py

+            force_reload=False
+        #name='sample',
+    ) -> None:
+        assert set_type in {'train', 'val'}


missing test?

Suggested change

assert set_type in {'train', 'val'}

assert set_type in {'train', 'val', 'test'}

xnuohz · 2024-11-28T03:19:44Z

torch_geometric/datasets/proteinmpnn_dataset.py

+            self,
+            root,
+            set_type,  # 'train', 'val', or 'test'
+            params,
+            transform=None,
+            pre_transform=None,
+            pre_filter=None,
+            log=True,
+            force_reload=False
+        #name='sample',


Add type hints

xnuohz · 2024-11-28T16:14:42Z

torch_geometric/datasets/proteinmpnn_dataset.py

+        osp.join(self.root)
+        fs.rm(self.raw_dir)
+
+    def build_training_clusters(self, params, debug):


debug mode should be removed.

xnuohz · 2024-11-28T16:23:04Z

torch_geometric/datasets/proteinmpnn_dataset.py

+            item_graph = Data(
+                seq=seq,
+                xyz=torch.cat(xyz, dim=0),
+                idx=torch.cat(idx, dim=0),
+                masked=torch.Tensor(masked).int(),
+                #label = self.item[0]
+            )


I didn't see x and edge_index in the dataset, ProteinMPNN's input only includes sequence data, right?

ProteinMPNN_dataset

b44cc9d

jdhenaos requested a review from wsad1 as a code owner November 27, 2024 17:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

2fbfde7

for more information, see https://pre-commit.ci

xnuohz reviewed Nov 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProteinMPNN_dataset #9810

ProteinMPNN_dataset #9810

jdhenaos commented Nov 27, 2024

xnuohz left a comment

xnuohz Nov 28, 2024

xnuohz Nov 28, 2024

xnuohz Nov 28, 2024

xnuohz Nov 28, 2024

xnuohz Nov 28, 2024

	assert set_type in {'train', 'val'}
	assert set_type in {'train', 'val', 'test'}

ProteinMPNN_dataset #9810

Are you sure you want to change the base?

ProteinMPNN_dataset #9810

Conversation

jdhenaos commented Nov 27, 2024

xnuohz left a comment

Choose a reason for hiding this comment

xnuohz Nov 28, 2024

Choose a reason for hiding this comment

xnuohz Nov 28, 2024

Choose a reason for hiding this comment

xnuohz Nov 28, 2024

Choose a reason for hiding this comment

xnuohz Nov 28, 2024

Choose a reason for hiding this comment

xnuohz Nov 28, 2024

Choose a reason for hiding this comment