Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Fix typos and wording across the mooc #764

Merged
merged 5 commits into from
Apr 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python_scripts/02_numerical_pipeline_introduction.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
data.head()

# %% [markdown]
# We can now linger on the variables, also denominated features, that we later
# We can now focus on the variables, also denominated features, that we later
# use to build our predictive model. In addition, we can also check how many
# samples are available in our dataset.

Expand Down
9 changes: 5 additions & 4 deletions python_scripts/03_categorical_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@
# and check the generalization performance of this machine learning pipeline using
# cross-validation.
#
# Before we create the pipeline, we have to linger on the `native-country`.
# Before we create the pipeline, we have to focus on the `native-country`.
# Let's recall some statistics regarding this column.

# %%
Expand Down Expand Up @@ -329,9 +329,10 @@
print(f"The accuracy is: {scores.mean():.3f} ± {scores.std():.3f}")

# %% [markdown]
# As you can see, this representation of the categorical variables is
# slightly more predictive of the revenue than the numerical variables
# that we used previously.
# As you can see, this representation of the categorical variables is slightly
# more predictive of the revenue than the numerical variables that we used
# previously. The reason being that we have more (predictive) categorical
# features than numerical ones.

# %% [markdown]
#
Expand Down
2 changes: 1 addition & 1 deletion python_scripts/cross_validation_train_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# of predictive models. While this section could be slightly redundant, we
# intend to go into details into the cross-validation framework.
#
# Before we dive in, let's linger on the reasons for always having training and
# Before we dive in, let's focus on the reasons for always having training and
# testing sets. Let's first look at the limitation of using a dataset without
# keeping any samples out.
#
Expand Down
7 changes: 7 additions & 0 deletions python_scripts/ensemble_sol_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,10 @@

plt.plot(data_range[feature_name], forest_predictions, label="Random forest")
_ = plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")

# %% [markdown] tags=["solution"]
# The random forest reduces the overfitting of the individual trees but still
# overfits itself. In the section on "hyperparameter tuning with ensemble
# methods" we will see how to further mitigate this effect. Still, interested
# users may increase the number of estimators in the forest and try different
# values of, e.g., `min_samples_split`.
2 changes: 1 addition & 1 deletion python_scripts/linear_models_ex_04.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# In the previous Module we tuned the hyperparameter `C` of the logistic
# regression without mentioning that it controls the regularization strength.
# Later, on the slides on 🎥 **Intuitions on regularized linear models** we
# metioned that a small `C` provides a more regularized model, whereas a
# mentioned that a small `C` provides a more regularized model, whereas a
# non-regularized model is obtained with an infinitely large value of `C`.
# Indeed, `C` behaves as the inverse of the `alpha` coefficient in the `Ridge`
# model.
Expand Down
2 changes: 1 addition & 1 deletion python_scripts/linear_models_sol_04.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# In the previous Module we tuned the hyperparameter `C` of the logistic
# regression without mentioning that it controls the regularization strength.
# Later, on the slides on 🎥 **Intuitions on regularized linear models** we
# metioned that a small `C` provides a more regularized model, whereas a
# mentioned that a small `C` provides a more regularized model, whereas a
# non-regularized model is obtained with an infinitely large value of `C`.
# Indeed, `C` behaves as the inverse of the `alpha` coefficient in the `Ridge`
# model.
Expand Down
5 changes: 3 additions & 2 deletions python_scripts/metrics_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,9 @@
# %% [markdown]
# The $R^2$ score represents the proportion of variance of the target that is
# explained by the independent variables in the model. The best score possible
# is 1 but there is no lower bound. However, a model that predicts the expected
# value of the target would get a score of 0.
# is 1 but there is no lower bound. However, a model that predicts the [expected
# value](https://en.wikipedia.org/wiki/Expected_value) of the target would get a
# score of 0.

# %%
from sklearn.dummy import DummyRegressor
Expand Down
Loading