Skip to content

Commit

Permalink
Merge pull request #908 from caseyyoungflesh/04-fix_feline-data_v2.csv
Browse files Browse the repository at this point in the history
Add data in place of `feline-data_v2.csv`, closes #717
  • Loading branch information
naupaka authored Jan 7, 2025
2 parents 4d666c8 + c61e2a2 commit cd7dbe7
Showing 1 changed file with 25 additions and 55 deletions.
80 changes: 25 additions & 55 deletions episodes/04-data-structures-part1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -164,78 +164,50 @@ No matter how
complicated our analyses become, all data in R is interpreted as one of these
basic data types. This strictness has some really important consequences.

A user has added details of another cat. This information is in the file
`data/feline-data_v2.csv`.
A user has provided details of another cat. We can add an additional row to our cats table using `rbind()`.

```{r, eval=FALSE}
file.show("data/feline-data_v2.csv")
```

```{r, eval=FALSE}
coat,weight,likes_catnip
calico,2.1,1
black,5.0,0
tabby,3.2,1
tabby,2.3 or 2.4,1
```{r}
additional_cat <- data.frame(coat = "tabby", weight = "2.3 or 2.4", likes_catnip = 1)
additional_cat
cats2 <- rbind(cats, additional_cat)
cats2
```

Load the new cats data like before, and check what type of data we find in the
`weight` column:
Let's check what type of data we find in the
`weight` column of our new `cats2` object:

```{r}
cats <- read.csv(file="data/feline-data_v2.csv")
typeof(cats$weight)
typeof(cats2$weight)
```

Oh no, our weights aren't the double type anymore! If we try to do the same math
we did on them before, we run into trouble:

```{r}
cats$weight + 2
cats2$weight + 2
```

What happened?
The `cats` data we are working with is something called a *data frame*. Data frames
The `cats` (and `cats2`) data we are working with is something called a *data frame*. Data frames
are one of the most common and versatile types of *data structures* we will work with in R.
A given column in a data frame cannot be composed of different data types.
In this case, R does not read everything in the data frame column `weight` as a *double*, therefore the entire
In this case, R cannot store everything in the data frame column `weight` as a *double* anymore once we add the row for the additional cat (because its weight is `2.3 or 2.4`), therefore the entire
column data type changes to something that is suitable for everything in the column.

When R reads a csv file, it reads it in as a *data frame*. Thus, when we loaded the `cats`
csv file, it is stored as a data frame. We can recognize data frames by the first row that
is written by the `str()` function:

```{r}
str(cats)
str(cats2)
```

*Data frames* are composed of rows and columns, where each column has the
same number of rows. Different columns in a data frame can be made up of different
data types (this is what makes them so versatile), but everything in a given
column needs to be the same type (e.g., vector, factor, or list).

Let's explore more about different data structures and how they behave.
For now, let's remove that extra line from our cats data and reload it,
while we investigate this behavior further:

feline-data.csv:

```
coat,weight,likes_catnip
calico,2.1,1
black,5.0,0
tabby,3.2,1
```

And back in RStudio:

```{r, eval=FALSE}
cats <- read.csv(file="data/feline-data.csv")
```

```{r, include=FALSE}
cats <- cats_orig
```
Let's explore more about different data structures and how they behave. For now, we will focus on our original data frame `cats` (and we can forget about `cats2` for the rest of this episode).

### Vectors and Type Coercion

Expand Down Expand Up @@ -389,8 +361,7 @@ Create a new script in RStudio and copy and paste the following code. Then
move on to the tasks below, which help you to fill in the gaps (\_\_\_\_\_\_).

```
# Read data
cats <- read.csv("data/feline-data_v2.csv")
Using the object `cats2`:
# 1. Print the data
_____
Expand All @@ -402,15 +373,15 @@ _____(cats)
# The correct data type is: ____________.
# 4. Correct the 4th weight data point with the mean of the two given values
cats$weight[4] <- 2.35
cats2$weight[4] <- 2.35
# print the data again to see the effect
cats
# 5. Convert the weight to the right data type
cats$weight <- ______________(cats$weight)
cats2$weight <- ______________(cats2$weight)
# Calculate the mean to test yourself
mean(cats$weight)
mean(cats2$weight)
# If you see the correct mean value (and not NA), you did the exercise
# correctly!
Expand All @@ -420,7 +391,7 @@ mean(cats$weight)

#### 1\. Print the data

Execute the first statement (`read.csv(...)`). Then print the data to the
Print the data to the
console

::::::::::::::: solution
Expand All @@ -435,8 +406,8 @@ Show the content of any variable by typing its name.
Two correct solutions:

```
cats
print(cats)
cats2
print(cats2)
```

:::::::::::::::::::::::::
Expand All @@ -445,7 +416,7 @@ print(cats)

The data type of your data is as important as the data itself. Use a
function we saw earlier to print out the data types of all columns of the
`cats` table.
`cats2` `data.frame`.

::::::::::::::: solution

Expand All @@ -462,15 +433,14 @@ here.
> ### Solution to Challenge 1.2
>
> ```
> str(cats)
> str(cats2)
> ```
#### 3\. Which data type do we need?
The shown data type is not the right one for this data (weight of
a cat). Which data type do we need?
- Why did the `read.csv()` function not choose the correct data type?
- Fill in the gap in the comment with the correct data type for cat weight!
::::::::::::::: solution
Expand Down Expand Up @@ -549,8 +519,8 @@ auto-complete function: Type "`as.`" and then press the TAB key.
> There are two functions that are synonymous for historic reasons:
>
> ```
> cats$weight <- as.double(cats$weight)
> cats$weight <- as.numeric(cats$weight)
> cats2$weight <- as.double(cats2$weight)
> cats2$weight <- as.numeric(cats2$weight)
> ```
::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down

0 comments on commit cd7dbe7

Please sign in to comment.