Skip to content

Commit

Permalink
[ci skip] MAINT Fix typos and wording across the mooc (#764)
Browse files Browse the repository at this point in the history
Co-authored-by: ArturoAmorQ <[email protected]> 1237225
  • Loading branch information
glemaitre committed Apr 26, 2024
1 parent e60b9bd commit 3ac8ce4
Show file tree
Hide file tree
Showing 19 changed files with 320 additions and 304 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
data

# %% [markdown]
# We can now linger on the variables, also denominated features, that we later
# We can now focus on the variables, also denominated features, that we later
# use to build our predictive model. In addition, we can also check how many
# samples are available in our dataset.

Expand Down
9 changes: 5 additions & 4 deletions _sources/python_scripts/03_categorical_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@
# and check the generalization performance of this machine learning pipeline using
# cross-validation.
#
# Before we create the pipeline, we have to linger on the `native-country`.
# Before we create the pipeline, we have to focus on the `native-country`.
# Let's recall some statistics regarding this column.

# %%
Expand Down Expand Up @@ -329,9 +329,10 @@
print(f"The accuracy is: {scores.mean():.3f} ± {scores.std():.3f}")

# %% [markdown]
# As you can see, this representation of the categorical variables is
# slightly more predictive of the revenue than the numerical variables
# that we used previously.
# As you can see, this representation of the categorical variables is slightly
# more predictive of the revenue than the numerical variables that we used
# previously. The reason being that we have more (predictive) categorical
# features than numerical ones.

# %% [markdown]
#
Expand Down
2 changes: 1 addition & 1 deletion _sources/python_scripts/cross_validation_train_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# of predictive models. While this section could be slightly redundant, we
# intend to go into details into the cross-validation framework.
#
# Before we dive in, let's linger on the reasons for always having training and
# Before we dive in, let's focus on the reasons for always having training and
# testing sets. Let's first look at the limitation of using a dataset without
# keeping any samples out.
#
Expand Down
7 changes: 7 additions & 0 deletions _sources/python_scripts/ensemble_sol_02.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,10 @@

plt.plot(data_range[feature_name], forest_predictions, label="Random forest")
_ = plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")

# %% [markdown] tags=["solution"]
# The random forest reduces the overfitting of the individual trees but still
# overfits itself. In the section on "hyperparameter tuning with ensemble
# methods" we will see how to further mitigate this effect. Still, interested
# users may increase the number of estimators in the forest and try different
# values of, e.g., `min_samples_split`.
2 changes: 1 addition & 1 deletion _sources/python_scripts/linear_models_ex_04.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# In the previous Module we tuned the hyperparameter `C` of the logistic
# regression without mentioning that it controls the regularization strength.
# Later, on the slides on 🎥 **Intuitions on regularized linear models** we
# metioned that a small `C` provides a more regularized model, whereas a
# mentioned that a small `C` provides a more regularized model, whereas a
# non-regularized model is obtained with an infinitely large value of `C`.
# Indeed, `C` behaves as the inverse of the `alpha` coefficient in the `Ridge`
# model.
Expand Down
2 changes: 1 addition & 1 deletion _sources/python_scripts/linear_models_sol_04.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# In the previous Module we tuned the hyperparameter `C` of the logistic
# regression without mentioning that it controls the regularization strength.
# Later, on the slides on 🎥 **Intuitions on regularized linear models** we
# metioned that a small `C` provides a more regularized model, whereas a
# mentioned that a small `C` provides a more regularized model, whereas a
# non-regularized model is obtained with an infinitely large value of `C`.
# Indeed, `C` behaves as the inverse of the `alpha` coefficient in the `Ridge`
# model.
Expand Down
5 changes: 3 additions & 2 deletions _sources/python_scripts/metrics_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,9 @@
# %% [markdown]
# The $R^2$ score represents the proportion of variance of the target that is
# explained by the independent variables in the model. The best score possible
# is 1 but there is no lower bound. However, a model that predicts the expected
# value of the target would get a score of 0.
# is 1 but there is no lower bound. However, a model that predicts the [expected
# value](https://en.wikipedia.org/wiki/Expected_value) of the target would get a
# score of 0.

# %%
from sklearn.dummy import DummyRegressor
Expand Down
52 changes: 26 additions & 26 deletions appendix/notebook_timings.html
Original file line number Diff line number Diff line change
Expand Up @@ -668,9 +668,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
</thead>
<tbody>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/01_tabular_data_exploration.html"><span class="doc">python_scripts/01_tabular_data_exploration</span></a></p></td>
<td><p>2024-04-26 13:50</p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>cache</p></td>
<td><p>8.22</p></td>
<td><p>7.83</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/01_tabular_data_exploration_ex_01.html"><span class="doc">python_scripts/01_tabular_data_exploration_ex_01</span></a></p></td>
Expand Down Expand Up @@ -704,15 +704,15 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/02_numerical_pipeline_hands_on.html"><span class="doc">python_scripts/02_numerical_pipeline_hands_on</span></a></p></td>
<td><p>2024-04-26 13:50</p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>cache</p></td>
<td><p>2.01</p></td>
<td><p>1.98</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/02_numerical_pipeline_introduction.html"><span class="doc">python_scripts/02_numerical_pipeline_introduction</span></a></p></td>
<td><p>2024-04-26 13:50</p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>cache</p></td>
<td><p>4.8</p></td>
<td><p>5.06</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/02_numerical_pipeline_scaling.html"><span class="doc">python_scripts/02_numerical_pipeline_scaling</span></a></p></td>
Expand All @@ -734,15 +734,15 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/03_categorical_pipeline.html"><span class="doc">python_scripts/03_categorical_pipeline</span></a></p></td>
<td><p>2024-04-26 13:50</p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>cache</p></td>
<td><p>2.8</p></td>
<td><p>3.06</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/03_categorical_pipeline_column_transformer.html"><span class="doc">python_scripts/03_categorical_pipeline_column_transformer</span></a></p></td>
<td><p>2024-04-26 13:50</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>4.23</p></td>
<td><p>4.42</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/03_categorical_pipeline_ex_01.html"><span class="doc">python_scripts/03_categorical_pipeline_ex_01</span></a></p></td>
Expand Down Expand Up @@ -836,9 +836,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/cross_validation_train_test.html"><span class="doc">python_scripts/cross_validation_train_test</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>10.87</p></td>
<td><p>11.39</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/cross_validation_validation_curve.html"><span class="doc">python_scripts/cross_validation_validation_curve</span></a></p></td>
Expand Down Expand Up @@ -1004,9 +1004,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_ex_02.html"><span class="doc">python_scripts/linear_models_ex_02</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>1.09</p></td>
<td><p>1.17</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_ex_03.html"><span class="doc">python_scripts/linear_models_ex_03</span></a></p></td>
Expand Down Expand Up @@ -1040,9 +1040,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_sol_02.html"><span class="doc">python_scripts/linear_models_sol_02</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>6.1</p></td>
<td><p>6.45</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_sol_03.html"><span class="doc">python_scripts/linear_models_sol_03</span></a></p></td>
Expand Down Expand Up @@ -1070,9 +1070,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_regression_without_sklearn.html"><span class="doc">python_scripts/linear_regression_without_sklearn</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>2.65</p></td>
<td><p>2.99</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/logistic_regression.html"><span class="doc">python_scripts/logistic_regression</span></a></p></td>
Expand Down Expand Up @@ -1130,15 +1130,15 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_grid_search.html"><span class="doc">python_scripts/parameter_tuning_grid_search</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>10.21</p></td>
<td><p>10.5</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_manual.html"><span class="doc">python_scripts/parameter_tuning_manual</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:52</p></td>
<td><p>cache</p></td>
<td><p>4.17</p></td>
<td><p>4.45</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_nested.html"><span class="doc">python_scripts/parameter_tuning_nested</span></a></p></td>
Expand All @@ -1154,9 +1154,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_randomized_search.html"><span class="doc">python_scripts/parameter_tuning_randomized_search</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:53</p></td>
<td><p>cache</p></td>
<td><p>24.21</p></td>
<td><p>22.88</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_sol_02.html"><span class="doc">python_scripts/parameter_tuning_sol_02</span></a></p></td>
Expand All @@ -1178,9 +1178,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/trees_dataset.html"><span class="doc">python_scripts/trees_dataset</span></a></p></td>
<td><p>2024-04-26 13:51</p></td>
<td><p>2024-04-26 13:53</p></td>
<td><p>cache</p></td>
<td><p>2.75</p></td>
<td><p>3.06</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/trees_ex_01.html"><span class="doc">python_scripts/trees_ex_01</span></a></p></td>
Expand Down
2 changes: 1 addition & 1 deletion python_scripts/02_numerical_pipeline_introduction.html
Original file line number Diff line number Diff line change
Expand Up @@ -1003,7 +1003,7 @@ <h2>Separate the data and the target<a class="headerlink" href="#separate-the-da
<p>39073 rows × 4 columns</p>
</div></div></div>
</div>
<p>We can now linger on the variables, also denominated features, that we later
<p>We can now focus on the variables, also denominated features, that we later
use to build our predictive model. In addition, we can also check how many
samples are available in our dataset.</p>
<div class="cell docutils container">
Expand Down
13 changes: 7 additions & 6 deletions python_scripts/03_categorical_pipeline.html
Original file line number Diff line number Diff line change
Expand Up @@ -1958,7 +1958,7 @@ <h2>Evaluate our predictive pipeline<a class="headerlink" href="#evaluate-our-pr
did with numerical data: let’s train a linear classifier on the encoded data
and check the generalization performance of this machine learning pipeline using
cross-validation.</p>
<p>Before we create the pipeline, we have to linger on the <code class="docutils literal notranslate"><span class="pre">native-country</span></code>.
<p>Before we create the pipeline, we have to focus on the <code class="docutils literal notranslate"><span class="pre">native-country</span></code>.
Let’s recall some statistics regarding this column.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
Expand Down Expand Up @@ -2078,8 +2078,8 @@ <h2>Evaluate our predictive pipeline<a class="headerlink" href="#evaluate-our-pr
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.18181372, 0.16217852, 0.17221045, 0.17812014, 0.16503692]),
&#39;score_time&#39;: array([0.02207351, 0.02198744, 0.02215958, 0.02402902, 0.02280927]),
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.18064904, 0.16906261, 0.17876267, 0.20569158, 0.17099452]),
&#39;score_time&#39;: array([0.02239752, 0.0232501 , 0.02577472, 0.02373815, 0.02260184]),
&#39;test_score&#39;: array([0.83232675, 0.83570478, 0.82831695, 0.83292383, 0.83497133])}
</pre></div>
</div>
Expand All @@ -2098,9 +2098,10 @@ <h2>Evaluate our predictive pipeline<a class="headerlink" href="#evaluate-our-pr
</div>
</div>
</div>
<p>As you can see, this representation of the categorical variables is
slightly more predictive of the revenue than the numerical variables
that we used previously.</p>
<p>As you can see, this representation of the categorical variables is slightly
more predictive of the revenue than the numerical variables that we used
previously. The reason being that we have more (predictive) categorical
features than numerical ones.</p>
<p>In this notebook we have:</p>
<ul class="simple">
<li><p>seen two common strategies for encoding categorical features: <strong>ordinal
Expand Down
10 changes: 5 additions & 5 deletions python_scripts/03_categorical_pipeline_column_transformer.html
Original file line number Diff line number Diff line change
Expand Up @@ -1571,8 +1571,8 @@ <h2>Evaluation of the model with cross-validation<a class="headerlink" href="#ev
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.24788404, 0.24504352, 0.22222066, 0.23252439, 0.26179743]),
&#39;score_time&#39;: array([0.02705979, 0.0278194 , 0.02626395, 0.02863431, 0.02582169]),
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.25630689, 0.26000094, 0.22319031, 0.24449325, 0.26766682]),
&#39;score_time&#39;: array([0.02926874, 0.02974772, 0.02790833, 0.02923989, 0.02736449]),
&#39;test_score&#39;: array([0.85116184, 0.84993346, 0.8482801 , 0.85257985, 0.85544636])}
</pre></div>
</div>
Expand Down Expand Up @@ -1644,8 +1644,8 @@ <h2>Fitting a more powerful model<a class="headerlink" href="#fitting-a-more-pow
</div>
</div>
<div class="cell_output docutils container">
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>CPU times: user 657 ms, sys: 15.8 ms, total: 672 ms
Wall time: 672 ms
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>CPU times: user 680 ms, sys: 12 ms, total: 692 ms
Wall time: 692 ms
</pre></div>
</div>
</div>
Expand All @@ -1657,7 +1657,7 @@ <h2>Fitting a more powerful model<a class="headerlink" href="#fitting-a-more-pow
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>0.881008926377856
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>0.8805994595037262
</pre></div>
</div>
</div>
Expand Down
Loading

0 comments on commit 3ac8ce4

Please sign in to comment.