Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add caching to package or saving/loading code to examples. #14

Open
manfred-lindmark opened this issue Jul 12, 2024 · 1 comment

Comments

@manfred-lindmark
Copy link

First of all thanks for the great tool, getting datasets should always be this simple.

A have a suggestion that would make it a bit easier to get started working with these datasets. Since downloading from UCI was pretty slow for me (several minutes for 6 MB, maybe because of my university's VPN), it would be good to save the dataset locally the first time it's been downloaded.

Maybe caching it using python's tempfile package is a good idea, or else add saving to the example for how to use this package.

from ucimlrepo import fetch_ucirepo
import pickle
import os

dataset_id = 2
fname = f"id_{dataset_id}.pkl"

if os.path.isfile(fname):
    with open(fname, "rb") as f:
        data = pickle.load(f)
else:
    data = fetch_ucirepo(id=dataset_id)
    with open(fname, "wb") as f:
        pickle.dump(data, f)
@ripaul
Copy link

ripaul commented Aug 6, 2024

I was just looking into exactly this and it turns out you cannot pickle the dotdicts the ucimlrepo uses. At least for me it fails with a strange error:

python3 test.py 
Traceback (most recent call last):
  File "/home/rpaul/proj/bnn-benchmark/src/test.py", line 10, in <module>
    data = pickle.load(f)
TypeError: 'NoneType' object is not callable

which however can be googled and leads to this SO: https://stackoverflow.com/a/2050357.
Adding the required methods to the dotdict resolves the issue for me. I opened a pull request for the change. It doesn't yet cache the downloaded data, but at least it allows you to implement caching manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants