Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

Open
1 task
mccrayc opened this issue Mar 24, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@mccrayc
Copy link
Collaborator

mccrayc commented Mar 24, 2023

Setup Information

  • xscen version: 0.5.0
  • Python version: 3.10.8
  • JupyterLab version: 3.6.1

Description

A pretty specific issue here: when extracting data with xs.extract_dataset, the attributes within the resulting dataset are labeled "intake_esm_attrs:..." rather than "cat:..." when extraction is done within a Dask client, and within JupyterLab. Doing the same thing but running from within a script produces the correct attribute names. Running outside of a Dask client works as normal, both in a script and in a Notebook.

Steps To Reproduce

Simple example. Attributes are correct for this version (e.g.: 'cat:id':'CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'):

import xscen as xs
cat='/tank/scenario/catalogues/simulation.json'  
variables_and_freqs={'tas':'MS'}
other_search_criteria= {"source": 'CanESM5*', "experiment":'ssp370', "member":'r1i1p1f1'}
xr_combine_kwargs ={'coords': 'minimal', 
                   'data_vars': 'minimal', 
                   'compat': 'override'}

periods = [2028,2029]

cat_ref = xs.search_data_catalogs(data_catalogs=[cat],
                                      variables_and_freqs=variables_and_freqs,
                                      other_search_criteria= other_search_criteria,
                                      periods = periods,
                                      allow_resampling=True, 
                                      allow_conversion = True)

ds_dict = xs.extract_dataset(cat_ref['CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'],
                            variables_and_freqs = variables_and_freqs,
                        xr_combine_kwargs  = xr_combine_kwargs )  `

However, if xs.extract_dataset is wrapped with a Dask client, attributes keys are incorrect (e.g., 'intake_esm_attrs:id': 'CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'):

from dask.distributed import Client
with Client(n_workers=1, threads_per_worker=4,
               memory_limit='12GB') as client:
    ds_dict = xs.extract_dataset(cat_ref['CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'],
                            variables_and_freqs = variables_and_freqs,
                        xr_combine_kwargs  = xr_combine_kwargs ) 
    print(ds_dict['MS'].attrs)

Additional context

Workaround suggested by @aulemahal :

Before xs.extract_dataset, after the "with Client(...) as client:" insert client.run(lambda: xs.__version__).

Contribution

  • I would be willing/able to open a Pull Request to address this bug.
@mccrayc mccrayc added the bug Something isn't working label Mar 24, 2023
@aulemahal
Copy link
Collaborator

@aulemahal
Copy link
Collaborator

Seems related to how the client workers are created and then on how the dask functions are pickled and sent to them.

In a script, this happens if the xscen import is done after if __name__ == '__main__'. We don't do that usually of course. However, in a notebook, all code is executed after the equivalent.

@aulemahal
Copy link
Collaborator

From the answer I got on StackOverflow, I don't see any easy xscen-level solution. The issue is that intake-esm makes use of a global state variable (the options) within a dask Delayed function, although « Dask is generally aiming to be functional/stateless, so that each function call produces results based only on the arguments it is supplied ».

Passing the options as an argument to the Delayed would fix that. On our side, the Client.run hack seems to be good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants