Skip to content

Commit

Permalink
Generate notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
lesteve committed Nov 6, 2023
1 parent 4851394 commit 26ad6d2
Show file tree
Hide file tree
Showing 8 changed files with 51 additions and 119 deletions.
2 changes: 1 addition & 1 deletion notebooks/datasets_adult_census.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Adult census dataset\n",
"# The adult census dataset\n",
"\n",
"[This dataset](http://www.openml.org/d/1590) is a collection of demographic\n",
"information for the adult population as of 1994 in the USA. The prediction\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/linear_models_ex_03.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For the following questions, you can copy adn paste the following snippet to\n",
"For the following questions, you can copy and paste the following snippet to\n",
"get the feature names from the column transformer here named `preprocessor`.\n",
"\n",
"```python\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/linear_models_sol_02.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,9 @@
"outputs": [],
"source": [
"# solution\n",
"culmen_length_first_sample = 181.0\n",
"flipper_length_first_sample = 181.0\n",
"culmen_depth_first_sample = 18.7\n",
"culmen_length_first_sample * culmen_depth_first_sample"
"flipper_length_first_sample * culmen_depth_first_sample"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/linear_models_sol_03.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For the following questions, you can copy adn paste the following snippet to\n",
"For the following questions, you can copy and paste the following snippet to\n",
"get the feature names from the column transformer here named `preprocessor`.\n",
"\n",
"```python\n",
Expand Down
65 changes: 13 additions & 52 deletions notebooks/linear_regression_non_linear_link.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
"cells": [
{
"cell_type": "markdown",
"id": "14eec485",
"metadata": {},
"source": [
"# Non-linear feature engineering for Linear Regression\n",
Expand All @@ -25,7 +24,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8f516165",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -44,13 +42,13 @@
},
{
"cell_type": "markdown",
"id": "00fd3b4f",
"metadata": {},
"source": [
"```{tip}\n",
"`np.random.RandomState` allows to create a random number generator which can\n",
"be later used to get deterministic results.\n",
"```\n",
"<div class=\"admonition tip alert alert-warning\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
"<p class=\"last\"><tt class=\"docutils literal\">np.random.RandomState</tt> allows to create a random number generator which can\n",
"be later used to get deterministic results.</p>\n",
"</div>\n",
"\n",
"To ease the plotting, we create a pandas dataframe containing the data and\n",
"target:"
Expand All @@ -59,7 +57,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5459a97b",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -71,7 +68,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8b1b2257",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -84,22 +80,21 @@
},
{
"cell_type": "markdown",
"id": "be69fae1",
"metadata": {},
"source": [
"```{warning}\n",
"In scikit-learn, by convention `data` (also called `X` in the scikit-learn\n",
"documentation) should be a 2D matrix of shape `(n_samples, n_features)`.\n",
"If `data` is a 1D vector, you need to reshape it into a matrix with a\n",
"<div class=\"admonition warning alert alert-danger\">\n",
"<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
"<p class=\"last\">In scikit-learn, by convention <tt class=\"docutils literal\">data</tt> (also called <tt class=\"docutils literal\">X</tt> in the scikit-learn\n",
"documentation) should be a 2D matrix of shape <tt class=\"docutils literal\">(n_samples, n_features)</tt>.\n",
"If <tt class=\"docutils literal\">data</tt> is a 1D vector, you need to reshape it into a matrix with a\n",
"single column if the vector represents a feature or a single row if the\n",
"vector represents a sample.\n",
"```"
"vector represents a sample.</p>\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46804be9",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -110,7 +105,6 @@
},
{
"cell_type": "markdown",
"id": "a4209f00",
"metadata": {
"lines_to_next_cell": 2
},
Expand All @@ -122,7 +116,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "a1bd392b",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -142,7 +135,6 @@
},
{
"cell_type": "markdown",
"id": "7bfcbeb8",
"metadata": {},
"source": [
"We now observe the limitations of fitting a linear regression model."
Expand All @@ -151,7 +143,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1545fec5",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -165,7 +156,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e8c79631",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -174,7 +164,6 @@
},
{
"cell_type": "markdown",
"id": "545fc1f3",
"metadata": {},
"source": [
"Here the coefficient and intercept learnt by `LinearRegression` define the\n",
Expand All @@ -185,7 +174,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0f95ceef",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -197,7 +185,6 @@
},
{
"cell_type": "markdown",
"id": "1a34a48c",
"metadata": {},
"source": [
"Notice that the learnt model cannot handle the non-linear relationship between\n",
Expand All @@ -217,7 +204,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e01b02d2",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -230,7 +216,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9a27773e",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -239,7 +224,6 @@
},
{
"cell_type": "markdown",
"id": "4d5070e3",
"metadata": {},
"source": [
"Instead of having a model which can natively deal with non-linearity, we could\n",
Expand All @@ -256,7 +240,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "28c13246",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -266,7 +249,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "69d0ba50",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -276,7 +258,6 @@
},
{
"cell_type": "markdown",
"id": "7925141e",
"metadata": {},
"source": [
"Instead of manually creating such polynomial features one could directly use\n",
Expand All @@ -286,7 +267,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d31ed0f4",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -297,7 +277,6 @@
},
{
"cell_type": "markdown",
"id": "6a7fe453",
"metadata": {},
"source": [
"In the previous cell we had to set `include_bias=False` as otherwise we would\n",
Expand All @@ -312,7 +291,6 @@
},
{
"cell_type": "markdown",
"id": "269fbe2b",
"metadata": {},
"source": [
"To demonstrate the use of the `PolynomialFeatures` class, we use a\n",
Expand All @@ -323,7 +301,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "38ba0c5c",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -340,7 +317,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5df7d4a4",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -349,7 +325,6 @@
},
{
"cell_type": "markdown",
"id": "fe259d20",
"metadata": {},
"source": [
"We can see that even with a linear model, we can overcome the linearity\n",
Expand Down Expand Up @@ -379,7 +354,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7d46da9b",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -392,7 +366,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9406b676",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -401,7 +374,6 @@
},
{
"cell_type": "markdown",
"id": "fd29730e",
"metadata": {},
"source": [
"The predictions of our SVR with a linear kernel are all aligned on a straight\n",
Expand All @@ -419,7 +391,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ae1550fa",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -430,7 +401,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c4670a4e",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -439,7 +409,6 @@
},
{
"cell_type": "markdown",
"id": "732b2b0f",
"metadata": {},
"source": [
"Kernel methods such as SVR are very efficient for small to medium datasets.\n",
Expand All @@ -460,7 +429,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e30e6b37",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -476,7 +444,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b46eb0ef",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -486,7 +453,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5403e6b1",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -502,7 +468,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "0dcdfe92",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -511,7 +476,6 @@
},
{
"cell_type": "markdown",
"id": "4b4f0560",
"metadata": {},
"source": [
"`Nystroem` is a nice alternative to `PolynomialFeatures` that makes it\n",
Expand All @@ -523,7 +487,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "41d6abd8",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -539,7 +502,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "be6a232c",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -550,7 +512,6 @@
},
{
"cell_type": "markdown",
"id": "7860e12d",
"metadata": {},
"source": [
"## Notebook Recap\n",
Expand Down Expand Up @@ -579,4 +540,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
Loading

0 comments on commit 26ad6d2

Please sign in to comment.