-
Notifications
You must be signed in to change notification settings - Fork 0
/
workflow_rstudio.qmd
158 lines (103 loc) · 7.43 KB
/
workflow_rstudio.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Workflow in RStudio
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("polishing")
```
This chapter give you some tips for make your workflow in RStudio easier.
## RStudio Projects
### What is an RStudio Project?
Using an RStudio Project will help you organise your analysis work, make it much easier to manage working directories and paths and also to collaborate with others including yourself on another computer! An RStudio Project is a folder that contains a file with the extension `.RProj` and all the code, data, and other files associated with a particular piece of work.
For example, if you were analysing some data on stem cells you might have an RStudio Project called "stem-cells". This would be a folder called `stem-cells`, known as the project folder, which contains the `stem-cells.Rproj` file - both of these are created automatically. Then you might create folders for the data and for figures from the analysis along with the script that contains code for importing the data, doing the analysis, creating the figures and writing the figures to file.
```
-- stem-cells
|__stem-cells.Rproj
|__analysis.R
|__data-raw
|__2019-03-21_donor_1.csv
|__2019-03-21_donor_2.csv
|__2019-03-21_donor_3.csv
|__figures
|__01_volcano_donor_1_vs_donor_2.png
|__02_volcano_donor_1_vs_donor_3.png
```
When you open an RStudio Project, it automatically sets the project directory as the working directory. This means when you write your code with paths relative to the project directory your code will work the same on any computer you send that RStudio Project to.
### Creating an RStudio Project
- Click on the drop-down menu on top right where it says "Project: (None)" and choose New Project
- A dialogue box will appear. Choose "New Directory", then "New Project"
- Click the Browse button next to "Create project as a subdirectory of:" to navigate to a place in your file system where you want to create the project folder.
- Type a name the "Directory name". This should be something that helps you identify the contents. Follow the advice in [Naming things](organising_work.html#naming-things)
## Some useful settings
You can adjust some of the default settings in RStudio you suit your own needs better. The settings are accessed through the Tools Menu under Global options. I like the following settings:
- When using RStudio Projects the working directory is the Project directory but when you start RStudio up and want to make a new project you might find the default location doesn't suit you. You can [change the default directory when not in a project](first_steps_rstudio.html#changing-some-defaults-to-make-life-easier).
- To ensure you have a fresh session with no R objects in the workspace you can [change Workspace options](first_steps_rstudio.html#changing-some-defaults-to-make-life-easier).
- In the Code options
- Display - check Use rainbow brackets which makes it easier to see which bracket are pairs
- Display - check Show margin which will add a line at 80 characters to help you use new lines more often and not create very long lines of code that are difficult to read
- Diagnostics - check Show diagnostics for R which will put a marker on the line that includes a syntax error and make a suggestion what the error is
- Diagnostics - check Provide R style diagnostics (e.g. whitespace) which will help you layout your code
-
## Handy housekeeping command
### Where am I?
There are several ways you can find out what your working directory is.
1. Code. The `getwd()` (**get** **w**orking **d**irectory)
```{r}
#| eval: false
getwd()
```
2. Along the top of the Console window. There is also a little arrow you can click to show your working directory in the Files pane.
3. In the Files pane, provided to have not navigated around in there. If you have, you can view your working directory using blue wheel and choosing "Go To Working Directory" or by using the arrow on the top of the Console window
### What files can I see?
There are several ways you can see the files and folders in your working directory.
1. Code. The `dir()` (**dir**ectory)
```{r}
#| eval: false
dir()
```
- `dir()` list the contents of the working directory
- `dir("..")` list the contents of the directory above the working directory
- `dir(../..)` list the contents of the directory two directories above the working directory
- `dir("data-raw")` list the contents of a folder call data-raw which is in the working directory.
2. Look in the the Files pane, provided to have not navigated around in there. If you have, you can view your working directory using blue wheel and choosing "Go To Working Directory" or by using the arrow on the top of the Console window
### What R objects can I see?
1. Code. The `ls()` (**l**i**s**t)
```{r}
#| eval: false
ls()
```
2. Look in the the Environment pane
## Understanding the pipe `|>`
The pipe operator improves code readability by:
- structuring sequences of data operations left-to-right and top to bottom rather than from inside and out),
- minimizing the need for intermediates,
- making it easy to add steps anywhere in the sequence of operations.
For example, suppose we want to apply a log-square root transformation which is sometimes applied to make a flat distribution more normal. There are two approaches we could use without the pipe: nesting the functions and creating an intermediate. We will consider both of these. First, let us generate a few numbers of work with:
```{r}
# generate some numbers
# this will give me ten random numbers between 1 and 100
nums <- sample(1:100, size = 10)
```
To apply the transformation we can nest the functions so the output put of `sqrt(nums)` becomes the input of `log()`:
```{r}
# apply a log-square root transformation
tnums <- log(sqrt(nums))
tnums
```
The first function to be applied is innermost. When we are using just two functions, the level of nesting does not cause too much difficulty in reading the code. However, you can image this gets more unreadable as the number of functions applied increases. It also makes it harder to debug and find out where an error might be. One solution is to create intermediate variables so the commands a given in order:
```{r}
# apply a log-square root transformation
sqrtnums <- sqrt(nums)
tnums <- log(sqrtnums)
```
Using intermediates make your code easier to follow at first but clutters up your environment and code with variables you don't care about. You also start of run out names!
The pipe is a more elegant and readable solution. It allows you to send the output of one operation as input to the next function. The pipe has long been used by Unix operating systems (where the pipe operator is `|`). The R pipe operator is `|>`, a short cut for which is Ctrl+Shift+M.
Using the pipe, we can apply out transformation with:
```{r}
tnums <- nums |>
sqrt() |>
log()
```
The command are in the order applied, there are no intermediates and the code is easier to debug and to build-up step-by-step..
Note that `|>` is the pipe that comes with base R which was only added in the last couple of years. Before it existed, `**tidyverse**` had a pipe operator provided by the `*magrittr**` package. The magrittr pipe is `%>%`. In your googling, you may well see code written using the `%>%`. In most cases, the pipes are interchangeable.
[What They Forgot to Teach You About R](https://rstats.wtf/) @bryanWFT