Skip to content

Commit

Permalink
Fix some typos
Browse files Browse the repository at this point in the history
  • Loading branch information
kleintom committed Jan 11, 2025
1 parent f3b95c4 commit f95a94e
Show file tree
Hide file tree
Showing 12 changed files with 22 additions and 23 deletions.
2 changes: 1 addition & 1 deletion EDA.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ You can see variation easily in real life; if you measure any continuous variabl
This is true even if you measure quantities that are constant, like the speed of light.
Each of your measurements will include a small amount of error that varies from measurement to measurement.
Variables can also vary if you measure across different subjects (e.g., the eye colors of different people) or at different times (e.g., the energy levels of an electron at different moments).
Every variable has its own pattern of variation, which can reveal interesting information about how that it varies between measurements on the same observation as well as across observations.
Every variable has its own pattern of variation, which can reveal interesting information about how it varies between measurements on the same observation as well as across observations.
The best way to understand that pattern is to visualize the distribution of the variable's values, which you've learned about in @sec-data-visualization.

We'll start our exploration by visualizing the distribution of weights (`carat`) of \~54,000 diamonds from the `diamonds` dataset.
Expand Down
2 changes: 1 addition & 1 deletion data-import.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ read_csv("data/students.csv") |>

We can read this file into R using `read_csv()`.
The first argument is the most important: the path to the file.
You can think about the path as the address of the file: the file is called `students.csv` and that it lives in the `data` folder.
You can think about the path as the address of the file: the file is called `students.csv` and it lives in the `data` folder.

```{r}
#| message: true
Expand Down
9 changes: 4 additions & 5 deletions databases.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ There are three high level differences between data frames and database tables:

Databases are run by database management systems (**DBMS**'s for short), which come in three basic forms:

- **Client-server** DBMS's run on a powerful central server, which you connect from your computer (the client). They are great for sharing data with multiple people in an organization. Popular client-server DBMS's include PostgreSQL, MariaDB, SQL Server, and Oracle.
- **Client-server** DBMS's run on a powerful central server, which you connect to from your computer (the client). They are great for sharing data with multiple people in an organization. Popular client-server DBMS's include PostgreSQL, MariaDB, SQL Server, and Oracle.
- **Cloud** DBMS's, like Snowflake, Amazon's RedShift, and Google's BigQuery, are similar to client server DBMS's, but they run in the cloud. This means that they can easily handle extremely large datasets and can automatically provide more compute resources as needed.
- **In-process** DBMS's, like SQLite or duckdb, run entirely on your computer. They're great for working with large datasets where you're the primary user.

Expand Down Expand Up @@ -295,7 +295,7 @@ flights |>
There are two important differences between dplyr verbs and SELECT clauses:

- In SQL, case doesn't matter: you can write `select`, `SELECT`, or even `SeLeCt`. In this book we'll stick with the common convention of writing SQL keywords in uppercase to distinguish them from table or variables names.
- In SQL, order matters: you must always write the clauses in the order `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `ORDER BY`. Confusingly, this order doesn't match how the clauses actually evaluated which is first `FROM`, then `WHERE`, `GROUP BY`, `SELECT`, and `ORDER BY`.
- In SQL, order matters: you must always write the clauses in the order `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `ORDER BY`. Confusingly, this order doesn't match how the clauses are actually evaluated which is first `FROM`, then `WHERE`, `GROUP BY`, `SELECT`, and `ORDER BY`.

The following sections explore each clause in more detail.

Expand Down Expand Up @@ -385,7 +385,7 @@ diamonds_db |>
show_query()
```

We'll come back to what's happening with translation `n()` and `mean()` in @sec-sql-expressions.
We'll come back to what's happening with the translation of `n()` and `mean()` in @sec-sql-expressions.

### WHERE

Expand Down Expand Up @@ -656,8 +656,7 @@ dbplyr's translations are certainly not perfect, and there are many R functions
In this chapter you learned how to access data from databases.
We focused on dbplyr, a dplyr "backend" that allows you to write the dplyr code you're familiar with, and have it be automatically translated to SQL.
We used that translation to teach you a little SQL; it's important to learn some SQL because it's *the* most commonly used language for working with data and knowing some will make it easier for you to communicate with other data folks who don't use R.
If you've finished this chapter and would like to learn more about SQL.
We have two recommendations:
If you've finished this chapter and would like to learn more about SQL, we have two recommendations:
- [*SQL for Data Scientists*](https://sqlfordatascientists.com) by Renée M. P. Teate is an introduction to SQL designed specifically for the needs of data scientists, and includes examples of the sort of highly interconnected data you're likely to encounter in real organizations.
- [*Practical SQL*](https://www.practicalsql.com) by Anthony DeBarros is written from the perspective of a data journalist (a data scientist specialized in telling compelling stories) and goes into more detail about getting your data into a database and running your own DBMS.
Expand Down
6 changes: 3 additions & 3 deletions functions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ In this chapter, you'll learn about three useful types of functions:
- Plot functions that take a data frame as input and return a plot as output.

Each of these sections includes many examples to help you generalize the patterns that you see.
These examples wouldn't be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations.
These examples wouldn't be possible without the help of folks of twitter, and we encourage you to follow the links in the comments to see the original inspirations.
You might also want to read the original motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680) to see even more functions.

### Prerequisites
Expand Down Expand Up @@ -175,7 +175,7 @@ These changes illustrate an important benefit of functions: because we've moved

### Mutate functions

Now you've got the basic idea of functions, let's take a look at a whole bunch of examples.
Now that you've got the basic idea of functions, let's take a look at a whole bunch of examples.
We'll start by looking at "mutate" functions, i.e. functions that work well inside of `mutate()` and `filter()` because they return an output of the same length as the input.

Let's start with a simple variation of `rescale01()`.
Expand Down Expand Up @@ -460,7 +460,7 @@ diamonds |>
summary6(carat)
```

Furthermore, since the arguments to summarize are data-masking also means that the `var` argument to `summary6()` is data-masking.
Furthermore, since the arguments to summarize are data-masking, so is the `var` argument to `summary6()`.
That means you can also summarize computed variables:

```{r}
Expand Down
2 changes: 1 addition & 1 deletion iteration.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ df_miss |>

If you look carefully, you might intuit that the columns are named using a glue specification (@sec-glue) like `{.col}_{.fn}` where `.col` is the name of the original column and `.fn` is the name of the function.
That's not a coincidence!
As you'll learn in the next section, you can use `.names` argument to supply your own glue spec.
As you'll learn in the next section, you can use the `.names` argument to supply your own glue spec.

### Column names

Expand Down
2 changes: 1 addition & 1 deletion joins.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ knitr::include_graphics("diagrams/relational.png", dpi = 270)

You'll notice a nice feature in the design of these keys: the primary and foreign keys almost always have the same names, which, as you'll see shortly, will make your joining life much easier.
It's also worth noting the opposite relationship: almost every variable name used in multiple tables has the same meaning in each place.
There's only one exception: `year` means year of departure in `flights` and year of manufacturer in `planes`.
There's only one exception: `year` means year of departure in `flights` and year manufactured in `planes`.
This will become important when we start actually joining tables together.

### Checking primary keys
Expand Down
6 changes: 3 additions & 3 deletions numbers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ df |>

### Offsets

`dplyr::lead()` and `dplyr::lag()` allow you to refer the values just before or just after the "current" value.
`dplyr::lead()` and `dplyr::lag()` allow you to refer to the values just before or just after the "current" value.
They return a vector of the same length as the input, padded with `NA`s at the start or end:

```{r}
Expand All @@ -475,7 +475,7 @@ You can lead or lag by more than one position by using the second argument, `n`.
### Consecutive identifiers
Sometimes you want to start a new group every time some event occurs.
For example, when you're looking at website data, it's common to want to break up events into sessions, where you begin a new session after gap of more than `x` minutes since the last activity.
For example, when you're looking at website data, it's common to want to break up events into sessions, where you begin a new session after a gap of more than `x` minutes since the last activity.
For example, imagine you have the times when someone visited a website:
```{r}
Expand Down Expand Up @@ -573,7 +573,7 @@ Here is a selection that you might find useful.
So far, we've mostly used `mean()` to summarize the center of a vector of values.
As we've seen in @sec-sample-size, because the mean is the sum divided by the count, it is sensitive to even just a few unusually high or low values.
An alternative is to use the `median()`, which finds a value that lies in the "middle" of the vector, i.e. 50% of the values is above it and 50% are below it.
An alternative is to use the `median()`, which finds a value that lies in the "middle" of the vector, i.e. 50% of the values are above it and 50% are below it.
Depending on the shape of the distribution of the variable you're interested in, mean or median might be a better measure of center.
For example, for symmetric distributions we generally report the mean while for skewed distributions we usually report the median.
Expand Down
2 changes: 1 addition & 1 deletion program.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@ The goal of these chapters is to teach you the minimum about programming that yo
Once you have mastered the material here, we strongly recommend that you continue to invest in your programming skills.
We've written two books that you might find helpful.
[*Hands on Programming with R*](https://rstudio-education.github.io/hopr/), by Garrett Grolemund, is an introduction to R as a programming language and is a great place to start if R is your first programming language.
[*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham dives into the details of R the programming language; it's great place to start if you have existing programming experience and great next step once you've internalized the ideas in these chapters.
[*Advanced R*](https://adv-r.hadley.nz/) by Hadley Wickham dives into the details of R the programming language; it's a great place to start if you have existing programming experience and a great next step once you've internalized the ideas in these chapters.
8 changes: 4 additions & 4 deletions spreadsheets.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ For the rest of the chapter we will focus on using `read_excel()`.

### Reading Excel spreadsheets {#sec-reading-spreadsheets-excel}

@fig-students-excel shows what the spreadsheet we're going to read into R looks like in Excel. This spreadsheet can be downloaded an Excel file from <https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/>.
@fig-students-excel shows what the spreadsheet we're going to read into R looks like in Excel. This spreadsheet can be downloaded as an Excel file from <https://docs.google.com/spreadsheets/d/1V1nPp1tzOuutXFLb3G9Eyxi3qxeEhnOXUzL5_BcCQ0w/>.

```{r}
#| label: fig-students-excel
Expand Down Expand Up @@ -342,12 +342,12 @@ bake_sale <- tibble(
bake_sale
```

You can write data back to disk as an Excel file using the `write_xlsx()` from the [writexl package](https://docs.ropensci.org/writexl/):
You can write data back to disk as an Excel file using the `write_xlsx()` function from the [writexl package](https://docs.ropensci.org/writexl/):

```{r}
#| eval: false
write_xlsx(bake_sale, path = "data/bake-sale.xlsx")
write_xlsx(bake_sale, path = "data/bake_sale.xlsx")
```

@fig-bake-sale-excel shows what the data looks like in Excel.
Expand All @@ -371,7 +371,7 @@ This makes Excel files unreliable for caching interim results as well.
For alternatives, see @sec-writing-to-a-file.

```{r}
read_excel("data/bake-sale.xlsx")
read_excel("data/bake_sale.xlsx")
```

### Formatted output
Expand Down
2 changes: 1 addition & 1 deletion strings.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -622,7 +622,7 @@ If you don't already know the code for your language, [Wikipedia](https://en.wik
Base R string functions automatically use the locale set by your operating system.
This means that base R string functions do what you expect for your language, but your code might work differently if you share it with someone who lives in a different country.
To avoid this problem, stringr defaults to English rules by using the "en" locale and requires you to specify the `locale` argument to override it.
Fortunately, there are two sets of functions where the locale really matters: changing case and sorting.
Fortunately, there are only two sets of functions where the locale really matters: changing case and sorting.

The rules for changing cases differ among languages.
For example, Turkish has two i's: with and without a dot.
Expand Down
2 changes: 1 addition & 1 deletion transform.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ In this part of the book, you'll learn about the most important types of variabl
#| label: fig-ds-transform
#| echo: false
#| fig-cap: |
#| The options for data transformation depends heavily on the type of
#| The options for data transformation depend heavily on the type of
#| data involved, the subject of this part of the book.
#| fig-alt: |
#| Our data science model, with transform highlighted in blue.
Expand Down
2 changes: 1 addition & 1 deletion workflow-scripts.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ report-2022-04-02.qmd
report-draft-notes.txt
```

Numbering the key scripts make it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies.
Numbering the key scripts makes it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies.
Additionally, the figures are labelled similarly, the reports are distinguished by dates included in the file names, and `temp` is renamed to `report-draft-notes` to better describe its contents.
If you have a lot of files in a directory, taking organization one step further and placing different types of files (scripts, figures, etc.) in different directories is recommended.

Expand Down

0 comments on commit f95a94e

Please sign in to comment.