Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Arturo Amor <[email protected]>
  • Loading branch information
PatriOr and ArturoAmorQ authored Nov 6, 2023
1 parent 06ee940 commit 246e866
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions python_scripts/parameter_tuning_ex_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@
# %% [markdown]
# Use the previously defined model (called `model`) and using two nested `for`
# loops, make a search of the best combinations of the `learning_rate` and
# `max_leaf_nodes` parameters. In this regard, you need to train and test the
# `max_leaf_nodes` parameters. In this regard, you have to train and test the
# model by setting the parameters. The evaluation of the model should be
# performed using `cross_val_score` on the training set. We use the following
# performed using `cross_val_score` on the training set. Use the following
# parameters search:
# - `learning_rate` for the values 0.01, 0.1, 1 and 10. This parameter controls
# the ability of a new tree to correct the error of the previous sequence of
Expand Down
2 changes: 1 addition & 1 deletion python_scripts/parameter_tuning_grid_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
)

# %% [markdown]
# We define a pipeline as seen in the first module. It handle both numerical and
# We define a pipeline as seen in the first module, to handle both numerical and
# categorical features.
#
# The first step is to select all the categorical columns.
Expand Down
6 changes: 3 additions & 3 deletions python_scripts/parameter_tuning_sol_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@
# %% [markdown]
# Use the previously defined model (called `model`) and using two nested `for`
# loops, make a search of the best combinations of the `learning_rate` and
# `max_leaf_nodes` parameters. In this regard, you will need to train and test
# the model by setting the parameters. The evaluation of the model should be
# performed using `cross_val_score` on the training set. We use the following
# `max_leaf_nodes` parameters. In this regard, you need to train and test the
# model by setting the parameters. The evaluation of the model should be
# performed using `cross_val_score` on the training set. Use the following
# parameters search:
# - `learning_rate` for the values 0.01, 0.1, 1 and 10. This parameter controls
# the ability of a new tree to correct the error of the previous sequence of
Expand Down
6 changes: 3 additions & 3 deletions python_scripts/trees_hyperparameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
# %% [markdown]
# For both classification and regression setting, we observe that increasing the
# depth makes the tree model more expressive. However, a tree that is too deep
# overfits the training data, creating partitions which are only correct for
# may overfit the training data, creating partitions which are only correct for
# "outliers" (noisy samples). The `max_depth` is one of the hyperparameters that
# one should optimize via cross-validation and grid-search.

Expand Down Expand Up @@ -172,7 +172,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
#
# The `max_depth` hyperparameter controls the overall complexity of the tree.
# This parameter is adequate under the assumption that a tree is built
# symmetrically. However, there is no guarantee that a tree is symmetrical.
# symmetrically. However, there is no reason why a tree should be symmetrical.
# Indeed, optimal generalization performance could be reached by growing some of
# the branches deeper than some others.
#
Expand All @@ -192,7 +192,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
X_1, y_1 = make_blobs(
n_samples=300, centers=[[0, 0], [-1, -1]], random_state=0
)
# Blobs that are easily separated
# Blobs that can be easily separated
X_2, y_2 = make_blobs(n_samples=300, centers=[[3, 6], [7, 0]], random_state=0)

X = np.concatenate([X_1, X_2], axis=0)
Expand Down
10 changes: 5 additions & 5 deletions python_scripts/trees_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@
data_train, target_train = penguins[[feature_name]], penguins[target_name]

# %% [markdown]
# To illustrate how decision trees are predicting in a regression setting, we
# create a synthetic dataset containing all possible flipper length from the
# minimum to the maximum of the original data.
# To illustrate how decision trees predict in a regression setting, we create a
# synthetic dataset containing some of the possible flipper length values
# between the minimum and the maximum of the original data.

# %%
import numpy as np
Expand All @@ -55,7 +55,7 @@
#
# However, computing an evaluation metric on such a synthetic test set would be
# meaningless since the synthetic dataset does not follow the same distribution
# as the real world data on which the model is deployed.
# as the real world data on which the model would be deployed.

# %%
import matplotlib.pyplot as plt
Expand Down Expand Up @@ -169,7 +169,7 @@
_ = plt.title("Prediction function using a DecisionTreeRegressor")

# %% [markdown]
# Increasing the depth of the tree increases the number of partition and thus
# Increasing the depth of the tree increases the number of partitions and thus
# the number of constant values that the tree is capable of predicting.
#
# In this notebook, we highlighted the differences in behavior of a decision
Expand Down
2 changes: 1 addition & 1 deletion python_scripts/trees_sol_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@

# %% [markdown] tags=["solution"]
# The linear model extrapolates using the fitted model for flipper lengths < 175
# mm and > 235 mm. In fact, we are using the model parametrization to make this
# mm and > 235 mm. In fact, we are using the model parametrization to make these
# predictions.
#
# As mentioned, decision trees are non-parametric models and we observe that
Expand Down

0 comments on commit 246e866

Please sign in to comment.