Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attrs_prefix not working in jupyter notebook with dask #316

Closed
1 task
juliettelavoie opened this issue Jan 18, 2024 · 2 comments
Closed
1 task

attrs_prefix not working in jupyter notebook with dask #316

juliettelavoie opened this issue Jan 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@juliettelavoie
Copy link
Contributor

Setup Information

  • xscen version:0.8
  • Python version: '3.11.5
  • Operating System: linux

Description

The attrs_prefix cat: doesn't work when in a jupyter notebook and under a dask client. Instead, we have intake_esm_attrs:

Steps To Reproduce

in a jupyter notebook

with (Client(n_workers=2, threads_per_worker=5, memory_limit="30GB", local_directory= '/exec/jlavoie/tmp_eg6/',
            dashboard_address=6786, silence_logs=True)):
    cat_sim = xs.DataCatalog('simulation.json')
    ds=cat_sim.search(id='CanDCS-U6_CMIP6_ScenarioMIP_MIROC_MIROC6_ssp585_r1i1p1f1_CAN').to_dask()
ds.attrs

gives

...
'intake_esm_attrs:institution': 'MIROC',
...

but

    ds=cat_sim.search(id='CanDCS-U6_CMIP6_ScenarioMIP_MIROC_MIROC6_ssp585_r1i1p1f1_CAN').to_dask()
ds.attrs

gives

...
'cat:institution': 'MIROC',
...

Additional context

  • Adding intake_esm.set_options(attrs_prefix="cat") does nothing.
  • It works well in when running a python file, but not in a jupyter notebook or the pycharm ipython console.
  • maybe this is a intake_esm issue ?

Contribution

  • I would be willing/able to open a Pull Request to address this bug.
@juliettelavoie juliettelavoie added the bug Something isn't working label Jan 18, 2024
@juliettelavoie
Copy link
Contributor Author

oups just saw #176

@aulemahal
Copy link
Collaborator

In a python file, Dask's multiprocessing initializes the workers with the state of the python process before if __name__ == '__main__'. In a notebook, all the code is executed after the equivalent. Thus, imports in any cell are not included in the initial state of the worker.

When to_dask is executed, the code sent to the workers includes references to intake-esm which is then imported automatically. However, it does not include any reference to xscen, which is then not imported, and the attrs_prefix option is never updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants