-
Notifications
You must be signed in to change notification settings - Fork 2
/
getting-started.qmd
420 lines (277 loc) · 14.8 KB
/
getting-started.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
# Getting Started {#sec-getting-started}
```{r}
#| echo: false
#| message: false
source("_common.R")
```
> ### Learning Objectives {.unnumbered}
>
> * Get familiar with using R in RStudio
> * Be able to use R to create and store values as objects.
> * Know some of the ways R handles certain things, like spaces.
> * Know how to create comments with the `#` symbol.
> * Know some best practices for staying organized in R with R projects.
## R and RStudio
R is a programming language that runs computations, and RStudio is an interface for working with R with a lot of convenient tools and features. It is the primary [integrated development environment (IDE)](https://en.wikipedia.org/wiki/Integrated_development_environment) for R users.
You can think of the two like this:
* R is like a car's _engine_.
* RStudio is like a car's _dashboard_.
R: Engine | RStudio: Dashboard
:-------------------------:|:-------------------------:
![](images/engine.jpg){ width=200 } | ![](images/dashboard.jpg){ width=250 }
Your car needs an engine (R) to run, but having a speedometer and rear view mirrors (RStudio) makes driving a lot easier.
To get started using R , you need to download and install both R and RStudio (Desktop version) on your computer. Go to the [introduction](intro.html#software) chapter for instructions.
Once you have everything installed, open RStudio. You should see the following:
![A typical RStudio session](images/rstudio_session.png){fig-alt="An image of a typical RStudio session" width=800 fig-aligh="left}
Notice the default panes:
* Console (entire left)
* Environment/History (tabbed in upper right)
* Files/Plots/Packages/Help (tabbed in lower right)
FYI: you can change the default location of the panes, among many other things: [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio).
Go into the Console on the left with the `>` (that's the *command prompt*).
Let's get started using R!
## Your first conveRsation
When you type something into the console, R will give you a reply. Think of it like having a conversation with R. For example, let's ask R to add two numbers:
```{r}
3 + 4
```
As you probably expected, R returned `7`. No surprises here!
> Quick note: you can ignore the `[1]` you see in the returned value...that's just R saying there's only one value to return.
But what happens if you ask R to add a number surrounded by quotations marks?
```{r}
#| error: true
3 + "4"
```
Looks like R didn't like that. That's because you asked R to add a number to something that is not a number (`"4"` is a _character_, which is different from the number `4`), so R returned an error message. This is R's what of telling you that you asked it to do something that it can't do.
Here's a helpful tip:
> EMBRACE THE ERROR MESSAGES!
By the end of this course, you will have seen loads of error messages. This doesn't mean you "can't code" or that you're "bad at coding" - it just means you've still got more work to do to solve the problem.
In fact, the best coders sometimes _intentionally_ write code with known errors in it in order to get an error message. This is because when R gives you an error message, most of the time there is a hint in it that can help you solve the problem that led to the error. For example, take a look at the error message from the last example:
```
Error in 3 + "4" : non-numeric argument to binary operator
```
Here R is saying that there was a "non-numeric argument" somewhere. That suggests that the problem might be with something not being a number. As we just discussed, `"4"` is a character, or a "non-numeric argument".
With practice, you'll get better at embracing and interpreting R's error messages.
## Storing values
You can store values by "assigning" them to an _object_ with the `<-` symbol, like this:
```{r}
x <- 2
```
Here the symbol `<-` is meant to look like an arrow. It means "assign the value `2` to the object named `x`".
> PRO TIP: To quickly type `<-`, use the shortcut `option` + `-` (mac) or `alt` + `-` (windows). There are lots of other helpful [shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts). Type `Alt` + `Shift` + `K` to bring up a shortcut reference card).
Since we assigned the value `2` to `x`, if we type `x` into the console and press "enter" R will return the stored value:
```{r}
x
```
If you overwrite an object with a different value, R will "forget" the previous assigned value and only keep the new assignment:
```{r}
x <- 42
x
```
> PRO TIP: Always surround `<-` with spaces to avoid confusion! For example, if you typed `x<-2` (no spaces), it's not clear if you meant `x <- 2` or `x < -2`. The first one assigns `2` to `x`, but the second one compares whether `x` is less than `-2`.
### Use meaningful variable names
You can choose almost any name you like for an object, so long as the name does not begin with a *number* or a special character like `+`, `-`, `*`, `/`, `^`, `!`, `@`, or `&`. But you should always use variable names that **describe the thing you're assigning**. This practice will save you major headaches later when you have lots of objects in your environment.
For example, let's say you have measured the length of a caterpillar and want to store it as an object. Here are three options for creating the object:
**Poor** variable name:
```{r}
x <- 42
```
**Good** variable name:
```{r}
length_mm <- 42
```
**Even better** variable name:
```{r}
caterpillar_length_mm <- 42
```
The first name, `x`, tells us nothing about what the value `42` means (are we counting something? `42` of what?). The second name, `length_mm`, tells us that `42` is the length of something, and that it's measured in millimeters. Finally, the last name, `caterpillar_length_mm`, tells us that `42` is the length of a caterpillar, measured in millimeters.
### Use standard casing styles
<img src="images/horst_casing.jpg" width=600>
[Art by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)]{.aside}
You will be wise to adopt a [convention for demarcating words](https://en.wikipedia.org/wiki/Camel_case) in names. I recommend using one of these:
- `snake_case_uses_underscores`
- `camelCaseUsesCaps`
Make another assignment:
```{r}
this_is_a_long_name <- 2.5
```
To inspect this, try out RStudio's completion facility: type the first few characters, press TAB - voila! RStudio auto-completes the long name for you :)
### R is case sensitive
To understand what this means, try this:
```{r}
cases_matter <- 2
Cases_matter <- 3
```
Let's try to inspect:
```{r}
cases_matter
Cases_matter
```
Although the two objects look_ similar, one has a capital "C", and R stores that as a different object.
In general, type carefully. Typos matter. Case matters. **Get better at typing**.
### The workspace
Look at your workspace in the upper-right pane. The workspace is where user-defined objects accumulate. You can also get a listing of these objects with commands:
```{r}
objects()
ls()
```
If you want to remove the object named `x`, you can do this
```{r}
rm(x)
```
To remove everything, use this:
```{r}
rm(list = ls())
```
or click the broom symbol.
## What else can R do?
R can do a LOT more than what we've seen thus far. For example, you can ask R to print text to the console using the `cat()` function:
```{r}
cat("Hello world!")
```
In the next section, we'll learn more about some of the distinctions between different types of values in R (like numbers and characters).
While R is a programming language, it is perhaps most commonly known as a tool for analyzing data and creating plots. For example, here's how you can use R to make a simple plot of the equation $y = x^2$:
```{r}
#| label: 'simple-plot'
#| warning: false
#| message: false
#| out.width: '75%'
x <- seq(from = -10, to = 10)
y <- x^2
plot(x, y)
lines(x, y)
```
But you can plot way more than equations in R! For example, take a look at this plot of some [actual data about penguins](https://allisonhorst.github.io/palmerpenguins/) (don't worry about the code for now - by the end of this course you'll know what it all does!):
```{r}
#| label: 'mass-flipper'
#| warning: false
#| message: false
#| fig.height: 5
#| fig.width: 7
library(ggplot2)
library(palmerpenguins)
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species, shape = species),
size = 3, alpha = 0.8) +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
theme_minimal() +
labs(title = "Penguin size, Palmer Station LTER",
subtitle = "Flipper length and body mass for Adelie, Chinstrap, and Gentoo Penguins",
x = "Flipper length (mm)",
y = "Body mass (g)",
color = "Penguin species",
shape = "Penguin species") +
theme(legend.position = c(0.2, 0.7),
legend.background = element_rect(fill = "white", color = NA),
plot.title.position = "plot",
plot.caption = element_text(hjust = 0, face= "italic"),
plot.caption.position = "plot")
```
## A couple more important points
### R ignores excess spacing
When I typed `3 + 4` before, I could equally have done this
```{r}
3 + 4
```
or this
```{r}
3 + 4
```
Both produce the same result. The point here is that R ignores extra spaces. This may seem irrelevant for now, but in some programming languages (e.g. Python) blank spaces matter a lot!
This doesn't mean extra spaces _never_ matter. For example, if you wanted to input the value `3.14` but you put a space after the `3`, you'll get an error:
```{r}
#| error: true
3 .14
```
Basically, you can put spaces between _different_ values, and you can put as many as you want and R won't care. But if you break a value up with a space, R will send an error message.
### Using comments
In R, the `#` symbol is a special symbol that denotes a comment. R will ignore anything on the same line that follows the `#` symbol. This enables us to write comments around our code to explain what we're doing:
```{r}
speed <- 55 # This is km/h, not mph!
speed
```
Notice that R ignores the whole sentence after the `#` symbol.
## Staying organized
### The history pane
R keeps track of your "command history." If you click on the console and hit the "up" key, the R console will show you the most recent command that you've typed. Hit it again, and it will show you the command before that, and so on.
The second way to get access to your command history is to look at the history panel in Rstudio. On the upper right hand side of the Rstudio window you'll see a tab labeled "History." Click on that and you'll see a list of all your recent commands displayed in that panel. It should look something like this:
<center>
<img src="images/rstudio_history.png" width=600>
</center>
If you double click on one of the commands, it will be copied to the R console.
### Working directory
Any process running on your computer has a notion of its "working directory". In R, this is where R will look for files you ask it to load. It's also where any files you write to disk will go.
You can explicitly check your working directory with:
```{r}
#| eval: false
getwd()
```
It is also displayed at the top of the RStudio console.
As a beginning R user, it's OK let your home directory or any other weird directory on your computer be R's working directory. _Very soon_, I urge you to evolve to the next level, where you organize your analytical projects into directories and, when working on project A, set R's working directory to the associated directory.
__Although I do not recommend it__, in case you're curious, you can set R's working directory at the command line like so:
```{r}
#| eval: false
setwd("~/myCoolProject")
```
__Although I do not recommend it__, you can also use RStudio's Files pane to navigate to a directory and then set it as working directory from the menu:
> Session > Set Working Directory > To Files Pane Location.
You'll see even more options there). Or within the Files pane, choose __More__ and __Set As Working Directory__.
But there's a better way. A way that also puts you on the path to managing your R work like an expert.
### RStudio projects
Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via its _projects_.
[Using Projects](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
Let's make one for practice. Do this:
> File > New Project ....
You should see the following pane:
<center>
<img src="images/rstudio_new_project.png" width=500>
</center>
Choose "New Directory". The directory name you choose here will be the project name. Call it whatever you want. RStudio will create a folder with that name to put all your project files.
As a demo, I created a project on my Desktop called "demo". RStudio created a new project called "demo", and in this folder there is a file called "demo.Rproj". If I double-click on this file, RStudio will open up, and my working directory will be automatically set to this folder! You can double check this by typing:
```{r}
#| eval: false
getwd()
```
### Save your code in .R Files
It is traditional to save R scripts with a `.R` or `.r` suffix. Any code you wish to re-run again later should be saved in this way and stored within your project folder. For example, if you wanted to re-run all of the code in this tutorial, open a new `.R` file and save it to your R project folder. Do this:
> File > New File > R Script
You can copy some of the code we've typed so far into this file to re-run it again later:
```{r}
#| eval: false
3 + 4
3 + "4"
x <- 2
x
x <- 42
x
this_is_a_long_name <- 2.5
cases_matter <- 2
Cases_matter <- 3
cases_matter
Cases_matter
objects()
ls()
rm(x)
rm(list = ls())
cat("Hello world!")
x <- seq(from = -10, to = 10)
y <- x^2
plot(x, y)
lines(x, y)
3 + 4
3 + 4
2 + 2 # I'm adding two numbers
getwd()
```
Then save this new R script with some name. Do this:
> File > Save
I called the file "tutorial.R" and saved it in my R project folder called "demo".
Now when I open the "demo.Rproj" file, I see in my files pane the "tutorial.R" code script. I can click on that file and continue editing it!
I can also run any line in the script by typing "Command + Enter" (Mac) or "Control + Enter" (Windows).
## Page sources {.unnumbered}
Some content on this page has been modified from other courses, including:
- Palmer penguins plot from the [**palmerpenguins**](https://github.com/allisonhorst/palmerpenguins/) package by Allison Horst.
- "Case" art by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)
- Danielle Navarro's book ["Learning Statistics With R"](https://learningstatisticswithr.com/book/introR.html)
- Jenny Bryan's [STAT 545 Course](http://stat545.com/)
- [Modern Dive](https://moderndive.netlify.com/), by Chester Ismay & Albert Y. Kim.