Suggestion: Add caching to package or saving/loading code to examples. #14

manfred-lindmark · 2024-07-12T08:34:08Z

First of all thanks for the great tool, getting datasets should always be this simple.

A have a suggestion that would make it a bit easier to get started working with these datasets. Since downloading from UCI was pretty slow for me (several minutes for 6 MB, maybe because of my university's VPN), it would be good to save the dataset locally the first time it's been downloaded.

Maybe caching it using python's tempfile package is a good idea, or else add saving to the example for how to use this package.

from ucimlrepo import fetch_ucirepo
import pickle
import os

dataset_id = 2
fname = f"id_{dataset_id}.pkl"

if os.path.isfile(fname):
    with open(fname, "rb") as f:
        data = pickle.load(f)
else:
    data = fetch_ucirepo(id=dataset_id)
    with open(fname, "wb") as f:
        pickle.dump(data, f)

ripaul · 2024-08-06T12:03:54Z

I was just looking into exactly this and it turns out you cannot pickle the dotdicts the ucimlrepo uses. At least for me it fails with a strange error:

python3 test.py 
Traceback (most recent call last):
  File "/home/rpaul/proj/bnn-benchmark/src/test.py", line 10, in <module>
    data = pickle.load(f)
TypeError: 'NoneType' object is not callable

which however can be googled and leads to this SO: https://stackoverflow.com/a/2050357.
Adding the required methods to the dotdict resolves the issue for me. I opened a pull request for the change. It doesn't yet cache the downloaded data, but at least it allows you to implement caching manually.

ripaul mentioned this issue Aug 6, 2024

Update dotdict.py #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Add caching to package or saving/loading code to examples. #14

Suggestion: Add caching to package or saving/loading code to examples. #14

manfred-lindmark commented Jul 12, 2024

ripaul commented Aug 6, 2024 •

edited

Loading

Suggestion: Add caching to package or saving/loading code to examples. #14

Suggestion: Add caching to package or saving/loading code to examples. #14

Comments

manfred-lindmark commented Jul 12, 2024

ripaul commented Aug 6, 2024 • edited Loading

ripaul commented Aug 6, 2024 •

edited

Loading