Skip to content

Commit

Permalink
Merge branch 'main' into debug-pr-preview
Browse files Browse the repository at this point in the history
  • Loading branch information
ogrisel authored Oct 26, 2023
2 parents dfc55f2 + d51a62b commit f5bb3f2
Show file tree
Hide file tree
Showing 26 changed files with 900 additions and 500 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/deploy-gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
pip install -r requirements-dev.txt
- name: Cache jupyter-cache folder
uses: actions/cache@v2
uses: actions/cache@v3
env:
cache-name: jupyter-cache
with:
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/jupyter-book-pr-preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ jobs:
sha: ${{ github.event.workflow_run.head_sha }}
context: 'JupyterBook preview'

- name: Get pull request number
id: pull-request-number
run: |
export PULL_REQUEST_NUMBER=${{github.event.workflow_run.event.number}}
echo "result=${PULL_REQUEST_NUMBER}" >> $GITHUB_OUTPUT
- uses: dawidd6/action-download-artifact@v2
with:
github_token: ${{secrets.GITHUB_TOKEN}}
workflow: deploy-gh-pages.yml
pr: ${{steps.pull-request-number.outputs.result}}
name: jupyter-book

- name: Get pull request number
id: pull-request-number
run: |
export PULL_REQUEST_NUMBER=$(cat pull_request_number)
echo "result=${PULL_REQUEST_NUMBER}" >> $GITHUB_OUTPUT
- uses: actions/setup-node@v3
with:
node-version: '16'
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 3 additions & 5 deletions jupyter-book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,29 +90,27 @@ parts:
- file: linear_models/linear_models_intuitions_index
sections:
- file: linear_models/linear_models_slides
- file: linear_models/linear_models_quiz_m4_01
- file: python_scripts/linear_regression_without_sklearn
- file: python_scripts/linear_models_ex_01
- file: python_scripts/linear_models_sol_01
- file: python_scripts/linear_regression_in_sklearn
- file: python_scripts/logistic_regression
- file: linear_models/linear_models_quiz_m4_02
- file: linear_models/linear_models_quiz_m4_01
- file: linear_models/linear_models_non_linear_index
sections:
- file: python_scripts/linear_regression_non_linear_link
- file: python_scripts/linear_models_ex_02
- file: python_scripts/linear_models_sol_02
- file: python_scripts/linear_models_feature_engineering_classification.py
- file: python_scripts/logistic_regression_non_linear
- file: linear_models/linear_models_quiz_m4_03
- file: linear_models/linear_models_quiz_m4_02
- file: linear_models/linear_models_regularization_index
sections:
- file: linear_models/regularized_linear_models_slides
- file: python_scripts/linear_models_regularization
- file: linear_models/linear_models_quiz_m4_04
- file: python_scripts/linear_models_ex_03
- file: python_scripts/linear_models_sol_03
- file: linear_models/linear_models_quiz_m4_05
- file: linear_models/linear_models_quiz_m4_03
- file: linear_models/linear_models_wrap_up_quiz
- file: linear_models/linear_models_module_take_away
- caption: Decision tree models
Expand Down
67 changes: 66 additions & 1 deletion jupyter-book/linear_models/linear_models_quiz_m4_01.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,75 @@ _Select a single answer_

```{admonition} Question
Is it possible to get a perfect fit (zero prediction error on the training set)
with a linear classifier by itself on a non-linearly separable dataset?
with a linear classifier **by itself** on a non-linearly separable dataset?
- a) yes
- b) no
_Select a single answer_
```

+++

```{admonition} Question
If we fit a linear regression where `X` is a single column vector, how many
parameters our model will be made of?
- a) 1
- b) 2
- c) 3
_Select a single answer_
```

+++

```{admonition} Question
If we train a scikit-learn `LinearRegression` with `X` being a single column
vector and `y` a vector, `coef_` and `intercept_` will be respectively:
- a) an array of shape (1, 1) and a number
- b) an array of shape (1,) and an array of shape (1,)
- c) an array of shape (1, 1) and an array of shape (1,)
- d) an array of shape (1,) and a number
_Select a single answer_
```

+++

```{admonition} Question
The decision boundaries of a logistic regression model:
- a) split classes using only one of the input features
- b) split classes using a combination of the input features
- c) often have curved shapes
_Select a single answer_
```

+++

```{admonition} Question
For a binary classification task, what is the shape of the array returned by the
`predict_proba` method for 10 input samples?
- a) (10,)
- b) (10, 2)
- c) (2, 10)
_Select a single answer_
```

+++

```{admonition} Question
In logistic regression's `predict_proba` method in scikit-learn, which of the
following statements is true regarding the predicted probabilities?
- a) The sum of probabilities across different classes for a given sample is always equal to 1.0.
- b) The sum of probabilities across all samples for a given class is always equal to 1.0.
- c) The sum of probabilities across all features for a given class is always equal to 1.0.
_Select a single answer_
```
59 changes: 18 additions & 41 deletions jupyter-book/linear_models/linear_models_quiz_m4_02.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,41 @@
# ✅ Quiz M4.02

```{admonition} Question
If we fit a linear regression where `X` is a single column vector, how many
parameters our model will be made of?
- a) 1
- b) 2
- c) 3
Let us consider a pipeline that combines a polynomial feature extraction of
degree 2 and a linear regression model. Let us assume that the linear regression
coefficients are all non-zero and that the dataset contains a single feature.
Is the prediction function of this pipeline a straight line?
_Select a single answer_
```

+++

```{admonition} Question
If we train a scikit-learn `LinearRegression` with `X` being a single column
vector and `y` a vector, `coef_` and `intercept_` will be respectively:
- a) an array of shape (1, 1) and a number
- b) an array of shape (1,) and an array of shape (1,)
- c) an array of shape (1, 1) and an array of shape (1,)
- d) an array of shape (1,) and a number
_Select a single answer_
```

+++

```{admonition} Question
The decision boundaries of a logistic regression model:
- a) split classes using only one of the input features
- b) split classes using a combination of the input features
- c) often have curved shapes
- a) yes
- b) no
_Select a single answer_
```

+++

```{admonition} Question
For a binary classification task, what is the shape of the array returned by the
`predict_proba` method for 10 input samples?
Fitting a linear regression where `X` has `n_features` columns and the target
is a single continuous vector, what is the respective type/shape of `coef_`
and `intercept_`?
- a) (10,)
- b) (10, 2)
- c) (2, 10)
- a) it is not possible to fit a linear regression in dimension higher than 2
- b) array of shape (`n_features`,) and a float
- c) array of shape (1, `n_features`) and an array of shape (1,)
_Select a single answer_
```

+++

```{admonition} Question
In logistic regression's `predict_proba` method in scikit-learn, which of the
following statements is true regarding the predicted probabilities?
Combining (one or more) feature engineering transformers in a single pipeline:
- a) The sum of probabilities across different classes for a given sample is always equal to 1.0.
- b) The sum of probabilities across all samples for a given class is always equal to 1.0.
- c) The sum of probabilities across all features for a given class is always equal to 1.0.
- a) increases the expressivity of the model
- b) ensures that models extrapolate accurately regardless of the distribution of the data
- c) may require tuning additional hyperparameters
- d) inherently prevents any underfitting
_Select a single answer_
_Select all answers that apply_
```
98 changes: 80 additions & 18 deletions jupyter-book/linear_models/linear_models_quiz_m4_03.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,103 @@
# ✅ Quiz M4.03

```{admonition} Question
Which of the following estimators can solve linear regression problems?
Let us consider a pipeline that combines a polynomial feature extraction of
degree 2 and a linear regression model. Let us assume that the linear regression
coefficients are all non-zero and that the dataset contains a single feature.
Is the prediction function of this pipeline a straight line?
- a) sklearn.linear_model.LinearRegression
- b) sklearn.linear_model.LogisticRegression
- c) sklearn.linear_model.Ridge
- a) yes
- b) no
_Select all answers that apply_
```

+++

```{admonition} Question
Regularization allows:
- a) to create a model robust to outliers (samples that differ widely from
other observations)
- b) to reduce overfitting by forcing the weights to stay close to zero
- c) to reduce underfitting by making the problem linearly separable
_Select a single answer_
```

+++

```{admonition} Question
Fitting a linear regression where `X` has `n_features` columns and the target
is a single continuous vector, what is the respective type/shape of `coef_`
and `intercept_`?
A ridge model is:
- a) it is not possible to fit a linear regression in dimension higher than 2
- b) array of shape (`n_features`,) and a float
- c) array of shape (1, `n_features`) and an array of shape (1,)
- a) the same as linear regression with penalized weights
- b) the same as logistic regression with penalized weights
- c) a linear model
- d) a non linear model
_Select a single answer_
_Select all answers that apply_
```

+++

```{admonition} Question
Combining (one or more) feature engineering transformers in a single pipeline:
Assume that a data scientist has prepared a train/test split and plans to use
the test for the final evaluation of a `Ridge` model. The parameter `alpha` of
the `Ridge` model:
- a) increases the expressivity of the model
- b) ensures that models extrapolate accurately regardless of its distribution
- c) may require tuning additional hyperparameters
- d) inherently prevents any underfitting
- a) is internally tuned when calling `fit` on the train set
- b) should be tuned by running cross-validation on a **train set**
- c) should be tuned by running cross-validation on a **test set**
- d) must be a positive number
_Select all answers that apply_
```

+++

```{admonition} Question
Scaling the data before fitting a model:
- a) is often useful for regularized linear models
- b) is always necessary for regularized linear models
- c) may speed-up fitting
- d) has no impact on the optimal choice of the value of a regularization parameter
_Select all answers that apply_
```

+++

```{admonition} Question
The effect of increasing the regularization strength in a ridge model is to:
- a) shrink all weights towards zero
- b) make all weights equal
- c) set a subset of the weights to exactly zero
- d) constrain all the weights to be positive
_Select all answers that apply_
```

+++

```{admonition} Question
The parameter `C` in a logistic regression is:
- a) similar to the parameter `alpha` in a ridge regressor
- b) similar to `1 / alpha` where `alpha` is the parameter of a ridge regressor
- c) not controlling the regularization
_Select a single answer_
```

+++

```{admonition} Question
In logistic regression, increasing the regularization strength (by
decreasing the value of `C`) makes the model:
- a) more likely to overfit to the training data
- b) more confident: the values returned by `predict_proba` are closer to 0 or 1
- c) less complex, potentially underfitting the training data
_Select a single answer_
```
Loading

0 comments on commit f5bb3f2

Please sign in to comment.