-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in reading the LMDB dataset #979
Comments
Hi @vishank-u 👋 You should be able to read the ODAC23 datasets using from fairchem.core.datasets import LmdbDataset
dataset = LmdbDataset(config=dict(src="path_to_ODAC23", r_energy=True, r_forces=True))
# print a datapoint, you can also use a torch DataLoader to loop through batches
print(dataset[0]) Let me know if you run into more issues. |
Hi @lbluque , Thanks for sharing the information. It worked, I can view the dataset entries. I have 2 more related questions:
from fairchem.core.datasets.lmdb_dataset import LmdbDataset
file_path = "../is2res_train_val_test_lmdbs/data/is2re/all"
dataset = LmdbDataset({"src": file_path + "/train"})
energies = torch.tensor([data.y_relaxed for data in dataset]) but it takes a lot of time to process, understandably so as the dataset is quite big. Is it the correct way to analyze the data or there is a better way. Also, is it better to store this tensor
Thanks in advance for answering my questions. Regards, |
Looping over the whole dataset in a list comprehension will take a long time. There isnt a straightforward way to query our datasets. An alternative is to loop through batches of data using a torch DataLoader. @anuroopsriram can you comment on this question:
|
You can loop through the dataset and look at the |
Hi @anuroopsriram , Thanks for the suggestion, I tried to look at the Data(edge_index=[2, 2964], pos=[86, 3], cell=[1, 3, 3], atomic_numbers=[86], natoms=86, cell_offsets=[2964, 3], force=[86, 3], distances=[2964], fixed=[86], sid=2472718, tags=[86], y_init=6.282500615000004, y_relaxed=-0.025550085000020317, pos_relaxed=[86, 3], id='0_0') Let me know if I am missing something. |
Hi @anuroopsriram, Even for looping through the dataset, is there a mapping of "sid" or something else to the corresponding MOF structure or one needs to create an ase object from this to filter the correct candidate? |
What would you like to report?
Hi,
I would like to read the LMDB dataset (IS2RE) of ODAC23 but cannot find the legacy tutorial to see the schema or query this data for one of the materials that are presented in the original paper. Any suggestion on directing to the correct dataloader or some scripts for this purpose?
Thanks
The text was updated successfully, but these errors were encountered: