|
| 1 | +# 524/424 Final Exam: Take-Home Portion |
| 2 | + |
| 3 | +## Admin |
| 4 | + |
| 5 | +### Optional |
| 6 | + |
| 7 | +As discussed in class, this portion of the exam is **optional**. If you choose not to submit this portion, your grade for the final-exam will be based solely on the in-class portion. If you submit this exam, it will count for 25% of your final-exam grade. |
| 8 | + |
| 9 | +### Academic honesty |
| 10 | + |
| 11 | +You **are not** allowed to work with anyone else. Working with *anyone* else will be considered cheating. You will receive a zero for **both** parts of the final exam and will fail the class. |
| 12 | + |
| 13 | +You *can* use online materials (including ChatGPT and Copilot), books, notes, solutions, *etc*. However, you still must put all of your answers **in your own words**. Copying other people's (and chatbots') words is also considered cheating. |
| 14 | + |
| 15 | +Ngan and Ed **will not** help you debug your code. Please do not ask. |
| 16 | + |
| 17 | +### Instructions |
| 18 | + |
| 19 | +**Due** Upload your answers to [Canvas](https://canvas.uoregon.edu/) *before* 10:15 **am** (Pacific) on Friday, 14 June 2024. |
| 20 | + |
| 21 | +**Important** You **must** submit your answers as an HTML or PDF file, built from an RMarkdown (`.RMD`) or Quarto (`.qmd`) file (you can also submit a link to an HTML page if you prefer that route). |
| 22 | + |
| 23 | +## Prompts |
| 24 | + |
| 25 | +Let's end where we began: [predicting house prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/) (as we did in the first two problem sets). Specifically, let's see if you can beat your old score using all of your fancy new prediction knowledge and ML skills. |
| 26 | + |
| 27 | +## Getting started |
| 28 | + |
| 29 | +**[01]** (10 points) **Visualize** Make sure you remember all of the variables in the dataset. Once you understand/recall the variables: Create three visualizations of the data that show some interesting insights. These figures should be publication quality: well labeled, aesthetically pleasing, and insightful. |
| 30 | + |
| 31 | +*Why?* Visualization is good practice—you should always visualize your data before and after analyzing it. Start the exam by making a few figures to understand the data. You can always make better figures after you finish the other steps. |
| 32 | + |
| 33 | +**[02]** (10 points) **Better regression?** In the past we used fairly simplistic imputation approaches for missing data. This time, use a more "sophisticated" approach for imputation. Then run a your original regression model. Predict onto the test set and and report your score. |
| 34 | + |
| 35 | +*Questions:* |
| 36 | + |
| 37 | +- Did the fancier imputation approach improve your model? |
| 38 | +- Why would "better" imputation matter? |
| 39 | + |
| 40 | +**[03]** (10 points) **Better-er regression?** Repeat **[02]** but this time use a lasso regression model. Report your score. |
| 41 | + |
| 42 | +*Questions:* |
| 43 | + |
| 44 | +- Did this approach improve your model? |
| 45 | +- Did the lasso model choose similar variables to your OLS model? |
| 46 | + |
| 47 | +**[04]** (10 points) **Going nonlinear?** Now use a random forest for the prediction. You need to tune it. Also: Keep the variable importance scores. |
| 48 | + |
| 49 | +*Questions:* |
| 50 | + |
| 51 | +- Which hyperparameters did you tune? |
| 52 | +- Did the random forest beat your penalized regression model? Report your score. |
| 53 | +- Did the variable importance from the random forest match the variables chosen by your penalized regression model? |
| 54 | + |
| 55 | +**[05]** (10 points) **Summary** Answer the following questions: |
| 56 | + |
| 57 | +- Which model performed best? |
| 58 | +- Would you say the "best" model is *significantly* better than the other models? Explain your answer. |
| 59 | +- What could make your model better? |
| 60 | + |
| 61 | +**[Bonus]** (Optional; 5 points) Use a (tuned) boosted tree model. Report your score and compare it to the other models. |
0 commit comments