-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Convert some of the Wrap-up M4 content into exercise #731
Conversation
# preprocessor.fit(data) | ||
# feature_names = ( | ||
# preprocessor.named_transformers_["onehotencoder"].get_feature_names_out( | ||
# categorical_columns | ||
# ) | ||
# ).tolist() | ||
# feature_names += numerical_columns | ||
# feature_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For info: I had to comment these lines by hand as it was rising a flake8 error F821 undefined name
when building the exercise from the solution.
We may need to think of a better way to avoid this situation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add the F821
failure to the ignore list in the flake8 configuration. Since we run all the code of the notebooks, including the solutions when building the jupyterbook we should be safe. The only code we do not check automatically are the solutions to the wrap up quiz but they are in the private repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that preprocessor
is defined in the solution but not in the exercise. So I think it will raise an error when building the jupyterbook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We can keep it that way then.
Maybe add a comment to state to reuse the preprocessor
variable defined in the solution of the previous question.
Similarly to the private review made on the wrap up quiz, I am worried that fitting such a pipeline might require too much RAM and CPU on jupyterhub and as a result might crash or be painfully slow to execute. We should either trim the number of categorical features using Furthermore, I would also replace standard scaling by |
…rap_up_M4_content
I decided not to do so because feature engineering actually degraded the performance of the spline model. |
Hum, it seems that I broken the jupyter preview... |
I will have to merge this PR in its current state to possibly debug the synchro of notebooks and their impact on FUN. We can always iterate on it on future PRs. |
Fixes #707.
Follows #711, which made the
logistic_regression_non_linear
notebook redundant. This PR creates an exercise to show the use of feature engineering using a more realistic dataset. In particular, demonstrates feature interaction when using one-hot encoding.Note: I had to build the exercises from the solutions to correctly render the index.