Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pawlak Dominik HW1 #10

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions Homeworks/Homework-I/Pawlak Dominik/Pawlak Dominik HW1.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: "PawlakDominik_HW1"
output: pdf_document
date: '2022-03-30'
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```



```{r cars}
library(DALEX)
library(caret)
set.seed(2137)

data <- read.csv("insurance.csv")
head(data)
```
Now, let's split the data into training and test datasets.
```{r}
index <- createDataPartition(apartments$m2.price, p = 0.5, list = FALSE)

train <- data[index,]
test <- data[-index,]

x_train <- train[,-c(7)]
y_train <- train[, 7]

x_test <- test[,-c(7)]
y_test <- test[, 7]
```

After splitting the data, we can train the model.
```{r}
library(ranger)

forest <- ranger(charges~., data=train)
y_pred <- predict(forest, x_test)
print(y_pred$predictions[50])
print(y_test[50])
```

Let's create explainer, then BreakDown Composition for this observation.
```{r}
explainer_rf <- DALEX::explain(forest,
data = x_test,
y = y_test)

bd_pr <- predict_parts(explainer = explainer_rf,
new_observation = x_test[50,],
type = "break_down")
bd_pr
```
Now let's check Shapley values
```{r}
shap_pr <- predict_parts(explainer = explainer_rf,
new_observation = x_test[50,],
type = "shap")
shap_pr
```

Let's plot and compare both charts
```{r}
plot(bd_pr)
plot(shap_pr)
```

Both plots suggest that variable smoker, with "yes" value has the biggest impact on the prediction and decreases the result. The variable "age" also decreases the prediction. Both plots suggest that 'sex' variable doesn't have big influence on the result. According to Break Down decomposition the region variable increases the prediction, whereas the according to the shapley values, this variable decreases it.

Now, let's find a female who doesn't smoke and check the results for that person.
```{r}
observation2 <- test[(test$sex=="female" & test$smoker=="no" & test$age >= 64),]
observation2 <- observation2[1,]
observation2
```

Now let's repeat steps for this observation.
```{r}
bd_pr <- predict_parts(explainer = explainer_rf,
new_observation = observation2,
type = "break_down")
bd_pr
```
And plot it:
```{r}
plot(bd_pr)
```
Now let's check Shapley values
```{r}
shap_pr <- predict_parts(explainer = explainer_rf,
new_observation = observation2,
type = "shap")
shap_pr
```
And plot it:
```{r}
plot(shap_pr)
```

Conclusions:
In the first observation, both plots present "smoker" (yes) as the most significant variable, that decreases the predicted result. On contrary, in the second observation, "age" variable turns out to be the most significant and it increases the predictions. What's more interesting and surprising the "smoker" (no) decreases the predictions. In both scenarios "children" and "sex" variables seem not to have big impact on the result. In the first case, "region" variable has also big impact, wheras in the second observation, it hardly affects the predictions.
Binary file not shown.