Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch04 - Baseline Forecasts using darts, 'MAC000193' not in data set #12

Open
hmzls opened this issue Mar 4, 2023 · 1 comment
Open

Comments

@hmzls
Copy link

hmzls commented Mar 4, 2023

Hi,

I ran into a problem where the household 'MAC000193' was not available in 'selected_blocks_train_missing_imputed.parquet'.

This is probably due to a bug in Ch02 - 02-Preprocessing. Here, block_data_path.glob("*.csv") returns a list of files, for which the ordering is not deterministic. On a Windows system, this list was ordered alphabetically, but on a Linux system, the list was ordered differently.

In Ch04 - 01-Setting up Experiment Harness, 50 households are sampled. Even though random_state is specified explicitly, the results may differ because the incoming lists are ordered differently.

A quick fix is to wrap block_data_path.glob("*.csv") in the Ch02 - 02-Preprocessing notebook with sorted().
Hope this helps somebody who runs into the same problem.

@hmzls hmzls changed the title Ch04 - Baseline Forecasts, 'MAC000193' not in data set Ch04 - Baseline Forecasts using darts, 'MAC000193' not in data set Mar 4, 2023
@manujosephv
Copy link
Collaborator

manujosephv commented Mar 5, 2023

That's awesome! I've edited the code to reflect this as well..

If everybody reported the issues and suggest possible solutions, we can make the book and the associated code better for everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants