-
Notifications
You must be signed in to change notification settings - Fork 0
/
hw9-data-frames.Rmd
69 lines (49 loc) · 3.56 KB
/
hw9-data-frames.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: "Homework 9 - Data Frames"
output:
html_document:
toc: false
number_sections: false
params:
number: 9
submit: "https://u.pcloud.com/#page=puplink&code=zVYkZj1qH1BqO2g7gAwnMNdXXuYKw31py"
purpose: |
The purposes of this assignment are:
>
> - To practice creating data frames in R.
> - To practice merging and slicing data frames in R.
---
```{r child = file.path("child", "setup.Rmd")}
```
```{r child = file.path("child", "hw.Rmd")}
```
### 1) Staying organized [SOLO, 5%]
Download and use [this template](templates/hw9.zip) for your assignment. Inside the "hw9" folder, open and edit the R script called "hw9.R" and fill out your name, GW Net ID, and the names of anyone you worked with on this assignment.
> ### **Using good style**
>
> For this assignment, you must use good style to receive full credit. Follow the best practices described in [this style guide](http://adv-r.had.co.nz/Style.html).
### 2) Inspect package data [SOLO, 15%]
Write R code to install the **dslabs** package from CRAN, then write code to load the package. Write some code to preview and inspect the `movielens` data frame that gets loaded when you load the package using some of the techniques we saw in class. For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is this dataset about?
- How many observations are in the data frame?
- What is the original source of the data?
- What type of data is each variable?
- What are the years of the earliest and most recent observations in the data set?
### 3) Answer questions about the data [COLLABORATIVE, 25%]
For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is the min, mean, and max rating in the data set?
- How many observations received the maximum rating?
- What percentage of total observations received the maximum rating?
- What is the title of the observation with the longest `title` (in terms of numbers of letters in the title)?
### 4) Loading and inspecting external data [SOLO, 20%]
Write R code to read in the `prisoners2019.csv` file located in the `data` folder. Store the object as `df`. Write some code to preview and inspect the `df` data frame using some of the techniques we saw in class. For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What do you think this dataset is about?
- How many observations are in the data frame?
- What type of data is each variable?
### 5) Answer questions about the data [COLLABORATIVE, 25%]
For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- Which states have the highest and lowest total prison population?
- Which states have the highest and lowest total prison population _as a percentage of the total state population_?
- According to the 2020 U.S. Census, [14.2%]() of the U.S. population is black. Nonetheless, some states have imprisoned more black people than any other race. Which states fit this description?
### 6) Read and reflect [SOLO, 10%]
Read and reflect on next week's readings on [data wrangling](r10-data-wrangling.html). Afterwards, in a comment (`#`) in your R file, write a short reflection on what you've learned and any questions or points of confusion you have about what we've covered thus far. This can just few a few sentences related to this assignment, next week's readings, things going on in the world that remind you something from class, etc. If there's anything that jumped out at you, write it down.