[ci skip] ENH Mention scaling behavior of binning and splines (#739)

Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> 767499b
INRIA · Oct 26, 2023 · e0570d1 · e0570d1
1 parent 442080b
commit e0570d1
Show file tree

Hide file tree

Showing 10 changed files with 35 additions and 18 deletions.
diff --git a/_images/65904496ae44a4a185e7c69818a8751bd821541b100b26091cf76db157f5a3f6.png b/_images/65904496ae44a4a185e7c69818a8751bd821541b100b26091cf76db157f5a3f6.png
diff --git a/_images/96da411b5c4ebfaa3ebeb9c05c1fa91e8164f132b58558585e11ad4b7d55a671.png b/_images/96da411b5c4ebfaa3ebeb9c05c1fa91e8164f132b58558585e11ad4b7d55a671.png
diff --git a/_images/a830e52976bbc92558a4865dcfe3ea4ee88afda4db3236c609b4f65dca9f6558.png b/_images/a830e52976bbc92558a4865dcfe3ea4ee88afda4db3236c609b4f65dca9f6558.png
diff --git a/_images/aa30de15e213b5786f4300f81791f9ae43dbe0b3edb40fcea1c7d5ec46154031.png b/_images/aa30de15e213b5786f4300f81791f9ae43dbe0b3edb40fcea1c7d5ec46154031.png
diff --git a/_images/d98cf10afde42d00cba794b3555d7e9ba000cebdef829967a70a33ccba1b60db.png b/_images/d98cf10afde42d00cba794b3555d7e9ba000cebdef829967a70a33ccba1b60db.png
diff --git a/_images/fb409cbf68b13df4149fba8f25d820751b1e5ad85c2612c3fe73045a40c0c004.png b/_images/fb409cbf68b13df4149fba8f25d820751b1e5ad85c2612c3fe73045a40c0c004.png
diff --git a/_sources/python_scripts/linear_models_feature_engineering_classification.py b/_sources/python_scripts/linear_models_feature_engineering_classification.py
@@ -235,7 +235,10 @@ def plot_decision_boundary(model, title=None):
 # %%
 from sklearn.preprocessing import KBinsDiscretizer
 
-classifier = make_pipeline(KBinsDiscretizer(n_bins=5), LogisticRegression())
+classifier = make_pipeline(
+    KBinsDiscretizer(n_bins=5, encode="onehot"),  # already the default params
+    LogisticRegression(),
+)
 classifier
 
 # %%
@@ -279,15 +282,20 @@ def plot_decision_boundary(model, title=None):
 # We can see that the decision boundary is now smooth, and while it favors
 # axis-aligned decision rules when extrapolating in low density regions, it can
 # adopt a more curvy decision boundary in the high density regions.
-#
-# Note however, that the number of knots is a hyperparameter that needs to be
-# tuned. If we use too few knots, the model would underfit the data, as shown on
-# the moons dataset. If we use too many knots, the model would overfit the data.
-#
 # However, as for the binning transformation, the model still fails to separate
 # the data for the XOR dataset, irrespective of the number of knots, for the
 # same reasons: **the spline transformation is a feature-wise transformation**
 # and thus **cannot capture interactions** between features.
+#
+# Take into account that the number of knots is a hyperparameter that needs to be
+# tuned. If we use too few knots, the model would underfit the data, as shown on
+# the moons dataset. If we use too many knots, the model would overfit the data.
+#
+# ```{note}
+# Notice that `KBinsDiscretizer(encode="onehot")` and `SplineTransformer` do not
+# require additional scaling. Indeed, they can replace the scaling step for
+# numerical features: they both create features with values in the [0, 1] range.
+# ```
 
 # %% [markdown]
 #

diff --git a/appendix/notebook_timings.html b/appendix/notebook_timings.html
@@ -1004,9 +1004,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Permal
 <td><p>✅</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_feature_engineering_classification.html"><span class="doc">python_scripts/linear_models_feature_engineering_classification</span></a></p></td>
-<td><p>2023-10-20 14:15</p></td>
+<td><p>2023-10-26 11:59</p></td>
 <td><p>cache</p></td>
-<td><p>10.8</p></td>
+<td><p>10.37</p></td>
 <td><p>✅</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/linear_models_regularization.html"><span class="doc">python_scripts/linear_models_regularization</span></a></p></td>

diff --git a/python_scripts/linear_models_feature_engineering_classification.html b/python_scripts/linear_models_feature_engineering_classification.html
@@ -935,7 +935,10 @@ <h2>Engineering non-linear features<a class="headerlink" href="#engineering-non-
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">KBinsDiscretizer</span>
 
-<span class="n">classifier</span> <span class="o">=</span> <span class="n">make_pipeline</span><span class="p">(</span><span class="n">KBinsDiscretizer</span><span class="p">(</span><span class="n">n_bins</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span> <span class="n">LogisticRegression</span><span class="p">())</span>
+<span class="n">classifier</span> <span class="o">=</span> <span class="n">make_pipeline</span><span class="p">(</span>
+    <span class="n">KBinsDiscretizer</span><span class="p">(</span><span class="n">n_bins</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">encode</span><span class="o">=</span><span class="s2">&quot;onehot&quot;</span><span class="p">),</span>  <span class="c1"># already the default params</span>
+    <span class="n">LogisticRegression</span><span class="p">(),</span>
+<span class="p">)</span>
 <span class="n">classifier</span>
 </pre></div>
 </div>
@@ -999,14 +1002,20 @@ <h2>Engineering non-linear features<a class="headerlink" href="#engineering-non-
 </div>
 <p>We can see that the decision boundary is now smooth, and while it favors
 axis-aligned decision rules when extrapolating in low density regions, it can
-adopt a more curvy decision boundary in the high density regions.</p>
-<p>Note however, that the number of knots is a hyperparameter that needs to be
-tuned. If we use too few knots, the model would underfit the data, as shown on
-the moons dataset. If we use too many knots, the model would overfit the data.</p>
-<p>However, as for the binning transformation, the model still fails to separate
+adopt a more curvy decision boundary in the high density regions.
+However, as for the binning transformation, the model still fails to separate
 the data for the XOR dataset, irrespective of the number of knots, for the
 same reasons: <strong>the spline transformation is a feature-wise transformation</strong>
 and thus <strong>cannot capture interactions</strong> between features.</p>
+<p>Take into account that the number of knots is a hyperparameter that needs to be
+tuned. If we use too few knots, the model would underfit the data, as shown on
+the moons dataset. If we use too many knots, the model would overfit the data.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>Notice that <code class="docutils literal notranslate"><span class="pre">KBinsDiscretizer(encode=&quot;onehot&quot;)</span></code> and <code class="docutils literal notranslate"><span class="pre">SplineTransformer</span></code> do not
+require additional scaling. Indeed, they can replace the scaling step for
+numerical features: they both create features with values in the [0, 1] range.</p>
+</div>
 </section>
 <section id="modeling-non-additive-feature-interactions">
 <h2>Modeling non-additive feature interactions<a class="headerlink" href="#modeling-non-additive-feature-interactions" title="Permalink to this heading">#</a></h2>
@@ -1084,7 +1093,7 @@ <h2>Modeling non-additive feature interactions<a class="headerlink" href="#model
 </div>
 </div>
 <div class="cell_output docutils container">
-<img alt="../_images/d98cf10afde42d00cba794b3555d7e9ba000cebdef829967a70a33ccba1b60db.png" src="../_images/d98cf10afde42d00cba794b3555d7e9ba000cebdef829967a70a33ccba1b60db.png" />
+<img alt="../_images/96da411b5c4ebfaa3ebeb9c05c1fa91e8164f132b58558585e11ad4b7d55a671.png" src="../_images/96da411b5c4ebfaa3ebeb9c05c1fa91e8164f132b58558585e11ad4b7d55a671.png" />
 </div>
 </div>
 <p>The polynomial kernel approach would be interesting in cases were the
@@ -1120,7 +1129,7 @@ <h2>Modeling non-additive feature interactions<a class="headerlink" href="#model
 </div>
 </div>
 <div class="cell_output docutils container">
-<img alt="../_images/a830e52976bbc92558a4865dcfe3ea4ee88afda4db3236c609b4f65dca9f6558.png" src="../_images/a830e52976bbc92558a4865dcfe3ea4ee88afda4db3236c609b4f65dca9f6558.png" />
+<img alt="../_images/fb409cbf68b13df4149fba8f25d820751b1e5ad85c2612c3fe73045a40c0c004.png" src="../_images/fb409cbf68b13df4149fba8f25d820751b1e5ad85c2612c3fe73045a40c0c004.png" />
 </div>
 </div>
 <p>The resulting decision boundary is <strong>smooth</strong> and can successfully separate
@@ -1197,7 +1206,7 @@ <h2>Multi-step feature engineering<a class="headerlink" href="#multi-step-featur
 </div>
 </div>
 <div class="cell_output docutils container">
-<img alt="../_images/65904496ae44a4a185e7c69818a8751bd821541b100b26091cf76db157f5a3f6.png" src="../_images/65904496ae44a4a185e7c69818a8751bd821541b100b26091cf76db157f5a3f6.png" />
+<img alt="../_images/aa30de15e213b5786f4300f81791f9ae43dbe0b3edb40fcea1c7d5ec46154031.png" src="../_images/aa30de15e213b5786f4300f81791f9ae43dbe0b3edb40fcea1c7d5ec46154031.png" />
 </div>
 </div>
 <p>The decision boundary of this pipeline is smooth, but with axis-aligned

diff --git a/searchindex.js b/searchindex.js