Problem in reading the LMDB dataset #979

vishank-u · 2025-01-22T11:02:14Z

What would you like to report?

Hi,

I would like to read the LMDB dataset (IS2RE) of ODAC23 but cannot find the legacy tutorial to see the schema or query this data for one of the materials that are presented in the original paper. Any suggestion on directing to the correct dataloader or some scripts for this purpose?

Thanks

lbluque · 2025-01-22T23:52:32Z

Hi @vishank-u 👋

You should be able to read the ODAC23 datasets using LmdbDataset in fairchem.core.datasets.lmdb_dataset:

from fairchem.core.datasets import LmdbDataset

dataset = LmdbDataset(config=dict(src="path_to_ODAC23", r_energy=True, r_forces=True))

# print a datapoint, you can also use a torch DataLoader to loop through batches
print(dataset[0])

Let me know if you run into more issues.

vishank-u · 2025-01-23T10:44:14Z

Hi @lbluque ,

Thanks for sharing the information. It worked, I can view the dataset entries. I have 2 more related questions:

If I would like to analyze some key property e.g. energy, currently I am using torch.tensor method:

from fairchem.core.datasets.lmdb_dataset import LmdbDataset

file_path = "../is2res_train_val_test_lmdbs/data/is2re/all"

dataset = LmdbDataset({"src": file_path + "/train"})
energies = torch.tensor([data.y_relaxed for data in dataset])

but it takes a lot of time to process, understandably so as the dataset is quite big. Is it the correct way to analyze the data or there is a better way. Also, is it better to store this tensor energies in the same location to avoid repeating this step everytime?

Is there a mapping key to query certain MOF's either by mp-id or by MOF name/id like POLDUQ?

Thanks in advance for answering my questions.

Regards,
Vishank

lbluque · 2025-01-23T17:14:21Z

Looping over the whole dataset in a list comprehension will take a long time. There isnt a straightforward way to query our datasets. An alternative is to loop through batches of data using a torch DataLoader.

@anuroopsriram can you comment on this question:

Is there a mapping key to query certain MOF's either by mp-id or by MOF name/id like POLDUQ?

anuroopsriram · 2025-01-24T18:11:56Z

Is there a mapping key to query certain MOF's either by mp-id or by MOF name/id like POLDUQ?

You can loop through the dataset and look at the name field. Unfortunately we don't have a simple way to query other than looping through the whole dataset.

vishank-u · 2025-01-27T12:47:54Z

Hi @anuroopsriram ,

Thanks for the suggestion, I tried to look at the name filed but it does not find this field. Here is the print of one datapoint:

Data(edge_index=[2, 2964], pos=[86, 3], cell=[1, 3, 3], atomic_numbers=[86], natoms=86, cell_offsets=[2964, 3], force=[86, 3], distances=[2964], fixed=[86], sid=2472718, tags=[86], y_init=6.282500615000004, y_relaxed=-0.025550085000020317, pos_relaxed=[86, 3], id='0_0')

Let me know if I am missing something.

vishank-u · 2025-02-05T16:46:55Z

Hi @anuroopsriram,

Even for looping through the dataset, is there a mapping of "sid" or something else to the corresponding MOF structure or one needs to create an ase object from this to filter the correct candidate?

lbluque added the question Further information is requested label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in reading the LMDB dataset #979

Problem in reading the LMDB dataset #979

vishank-u commented Jan 22, 2025 •

edited

Loading

lbluque commented Jan 22, 2025

vishank-u commented Jan 23, 2025

lbluque commented Jan 23, 2025

anuroopsriram commented Jan 24, 2025

vishank-u commented Jan 27, 2025

vishank-u commented Feb 5, 2025

Problem in reading the LMDB dataset #979

Problem in reading the LMDB dataset #979

Comments

vishank-u commented Jan 22, 2025 • edited Loading

What would you like to report?

lbluque commented Jan 22, 2025

vishank-u commented Jan 23, 2025

lbluque commented Jan 23, 2025

anuroopsriram commented Jan 24, 2025

vishank-u commented Jan 27, 2025

vishank-u commented Feb 5, 2025

vishank-u commented Jan 22, 2025 •

edited

Loading