Apply suggestions from code review

Co-authored-by: Arturo Amor <[email protected]>
INRIA · Nov 6, 2023 · 246e866 · 246e866
1 parent 06ee940
commit 246e866
Show file tree

Hide file tree

Showing 6 changed files with 15 additions and 15 deletions.
diff --git a/python_scripts/parameter_tuning_ex_02.py b/python_scripts/parameter_tuning_ex_02.py
@@ -68,9 +68,9 @@
 # %% [markdown]
 # Use the previously defined model (called `model`) and using two nested `for`
 # loops, make a search of the best combinations of the `learning_rate` and
-# `max_leaf_nodes` parameters. In this regard, you need to train and test the
+# `max_leaf_nodes` parameters. In this regard, you have to train and test the
 # model by setting the parameters. The evaluation of the model should be
-# performed using `cross_val_score` on the training set. We use the following
+# performed using `cross_val_score` on the training set. Use the following
 # parameters search:
 # - `learning_rate` for the values 0.01, 0.1, 1 and 10. This parameter controls
 #   the ability of a new tree to correct the error of the previous sequence of

diff --git a/python_scripts/parameter_tuning_grid_search.py b/python_scripts/parameter_tuning_grid_search.py
@@ -49,7 +49,7 @@
 )
 
 # %% [markdown]
-# We define a pipeline as seen in the first module. It handle both numerical and
+# We define a pipeline as seen in the first module, to handle both numerical and
 # categorical features.
 #
 # The first step is to select all the categorical columns.

diff --git a/python_scripts/parameter_tuning_sol_02.py b/python_scripts/parameter_tuning_sol_02.py
@@ -62,9 +62,9 @@
 # %% [markdown]
 # Use the previously defined model (called `model`) and using two nested `for`
 # loops, make a search of the best combinations of the `learning_rate` and
-# `max_leaf_nodes` parameters. In this regard, you will need to train and test
-# the model by setting the parameters. The evaluation of the model should be
-# performed using `cross_val_score` on the training set. We use the following
+# `max_leaf_nodes` parameters. In this regard, you need to train and test the
+# model by setting the parameters. The evaluation of the model should be
+# performed using `cross_val_score` on the training set. Use the following
 # parameters search:
 # - `learning_rate` for the values 0.01, 0.1, 1 and 10. This parameter controls
 #   the ability of a new tree to correct the error of the previous sequence of

diff --git a/python_scripts/trees_hyperparameters.py b/python_scripts/trees_hyperparameters.py
@@ -136,7 +136,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
 # %% [markdown]
 # For both classification and regression setting, we observe that increasing the
 # depth makes the tree model more expressive. However, a tree that is too deep
-# overfits the training data, creating partitions which are only correct for
+# may overfit the training data, creating partitions which are only correct for
 # "outliers" (noisy samples). The `max_depth` is one of the hyperparameters that
 # one should optimize via cross-validation and grid-search.
 
@@ -172,7 +172,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
 #
 # The `max_depth` hyperparameter controls the overall complexity of the tree.
 # This parameter is adequate under the assumption that a tree is built
-# symmetrically. However, there is no guarantee that a tree is symmetrical.
+# symmetrically. However, there is no reason why a tree should be symmetrical.
 # Indeed, optimal generalization performance could be reached by growing some of
 # the branches deeper than some others.
 #
@@ -192,7 +192,7 @@ def fit_and_plot_regression(model, data, feature_names, target_names):
 X_1, y_1 = make_blobs(
     n_samples=300, centers=[[0, 0], [-1, -1]], random_state=0
 )
-# Blobs that are easily separated
+# Blobs that can be easily separated
 X_2, y_2 = make_blobs(n_samples=300, centers=[[3, 6], [7, 0]], random_state=0)
 
 X = np.concatenate([X_1, X_2], axis=0)

diff --git a/python_scripts/trees_regression.py b/python_scripts/trees_regression.py
@@ -31,9 +31,9 @@
 data_train, target_train = penguins[[feature_name]], penguins[target_name]
 
 # %% [markdown]
-# To illustrate how decision trees are predicting in a regression setting, we
-# create a synthetic dataset containing all possible flipper length from the
-# minimum to the maximum of the original data.
+# To illustrate how decision trees predict in a regression setting, we create a
+# synthetic dataset containing some of the possible flipper length values
+# between the minimum and the maximum of the original data.
 
 # %%
 import numpy as np
@@ -55,7 +55,7 @@
 #
 # However, computing an evaluation metric on such a synthetic test set would be
 # meaningless since the synthetic dataset does not follow the same distribution
-# as the real world data on which the model is deployed.
+# as the real world data on which the model would be deployed.
 
 # %%
 import matplotlib.pyplot as plt
@@ -169,7 +169,7 @@
 _ = plt.title("Prediction function using a DecisionTreeRegressor")
 
 # %% [markdown]
-# Increasing the depth of the tree increases the number of partition and thus
+# Increasing the depth of the tree increases the number of partitions and thus
 # the number of constant values that the tree is capable of predicting.
 #
 # In this notebook, we highlighted the differences in behavior of a decision

diff --git a/python_scripts/trees_sol_02.py b/python_scripts/trees_sol_02.py
@@ -132,7 +132,7 @@
 
 # %% [markdown] tags=["solution"]
 # The linear model extrapolates using the fitted model for flipper lengths < 175
-# mm and > 235 mm. In fact, we are using the model parametrization to make this
+# mm and > 235 mm. In fact, we are using the model parametrization to make these
 # predictions.
 #
 # As mentioned, decision trees are non-parametric models and we observe that