-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProteinMPNN_dataset #9810
base: master
Are you sure you want to change the base?
ProteinMPNN_dataset #9810
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Plz add a dataset test and make CI pass:)
Args: | ||
root (str): Root directory where the dataset should be saved. | ||
params (dict): Dictionary of parameters for dataset creation: | ||
LIST: Path to the table with metadata. | ||
VAL: Path to list of cluster IDs for model validation. | ||
DIR: Path to dataset. | ||
DATCUT: Date (YYY-MM-DD) threshold of sequence deposition. | ||
RESCUT: Resolution cutoff for PDBs. | ||
HOMO: Minimal sequence identity to detect homodimeric chains. | ||
set_type (str): Type of expected data, train ("train") or validation ("val") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args are not aligned with __init__
parameters
force_reload=False | ||
#name='sample', | ||
) -> None: | ||
assert set_type in {'train', 'val'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing test
?
assert set_type in {'train', 'val'} | |
assert set_type in {'train', 'val', 'test'} |
self, | ||
root, | ||
set_type, # 'train', 'val', or 'test' | ||
params, | ||
transform=None, | ||
pre_transform=None, | ||
pre_filter=None, | ||
log=True, | ||
force_reload=False | ||
#name='sample', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type hints
osp.join(self.root) | ||
fs.rm(self.raw_dir) | ||
|
||
def build_training_clusters(self, params, debug): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug mode should be removed.
item_graph = Data( | ||
seq=seq, | ||
xyz=torch.cat(xyz, dim=0), | ||
idx=torch.cat(idx, dim=0), | ||
masked=torch.Tensor(masked).int(), | ||
#label = self.item[0] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see x
and edge_index
in the dataset, ProteinMPNN's input only includes sequence data, right?
A new Dataset for ProteinMPNN model was added.
The Dataset was evaluated using the next piece of code: