forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
r.rmd
334 lines (231 loc) · 13.5 KB
/
r.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
---
title: R code
layout: default
output: bookdown::html_chapter
---
# R code {#r}
The first principle of using a package is that all R code goes in `R/`. In this chapter, you'll learn about the `R/` directory, my recommendations for organising your functions into files, and some general tips on good style. You'll also learn about some important differences between functions in scripts and functions in packages.
## R code workflow {#r-workflow}
The first advantage to using a package is that it's easy to re-load your code. You can either run `devtools::load_all()`, or in RStudio press __Cmd + Shift + L__, which also saves all open files, saving you a keystroke.
These keyboard shortcut leads to a fluid development workflow:
1. Edit an R file.
1. Press Ctrl/Cmd + Shift + L.
1. Explore the code in the console.
1. Rinse and repeat.
Congratulations! You've learned your first package development workflow. Even if you learn nothing else from this book, you'll have gained a useful workflow for editing and reloading R code.
## Organising your functions {#r-organising}
While you're free to arrange functions into files as you wish, the two extremes are bad: don't put all functions into one file and don't put each function into its own separate file. (It's OK if some files only contain one function, particularly if the function is large or has a lot of documentation.). File names should be meaningful and end in `.R`.
```{r, eval = FALSE}
# Good
fit_models.R
utility_functions.R
# Bad
foo.r
stuff.r
```
Pay attention to capitalization, since you, or some of your collaborators, might be using an operating system with a case-insensitive file system (e.g., Microsoft Windows). Avoid problems by never using filenames that differ only in capitalisation.
My rule of thumb is that if I can't remember the name of the file where a function lives, I need to either separate the functions into more files or give the file a better name. (Unfortunately you can't use subdirectories inside `R/`. The next best thing is to use a common prefix, e.g., `abc-*.R`.).
The arrangement of functions within files is less important if you master two important RStudio keyboard shortcuts that let you jump to the definition of a function:
* Click a function name in code and press __F2__.
* Press __Ctrl + .__ then start typing the name:
```{r, echo = FALSE}
bookdown::embed_png("screenshots/file-finder.png", dpi = 220)
```
After navigating to a function using one of these tools, you can go back to where you were by clicking the back arrow at the top-left of the editor (`r bookdown::embed_png("screenshots/arrows.png", dpi = 240)`), or by pressing Ctrl/Cmd-F9.
## Code style {#style}
Good coding style is like using correct punctuation. You can manage without it, but it sure makes things easier to read. As with styles of punctuation, there are many possible variations. The following guide describes the style that I use (in this book and elsewhere). It is based on Google's [R style guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml), with a few tweaks.
You don't have to use my style, but I strongly recommend that you use a consistent style and you document it. If you're working on someone elses code, don't impose your own style. Instead, read their style documentation and follow it as closely as possible.
Good style is important because while your code only has one author, it will usually have multiple readers. This is especially true when you're writing code with others. In that case, it's a good idea to agree on a common style up-front. Since no style is strictly better than another, working with others may mean that you'll need to sacrifice some preferred aspects of your style.
The formatR package, by Yihui Xie, makes it easier to clean up poorly formatted code. It can't do everything, but it can quickly get your code from terrible to pretty good. Make sure to read [the notes on the website](http://yihui.name/formatR/) before using it. It's as easy as:
```{r, eval = FALSE}
install.packages("formatR")
formatR::tidy_dir("R")
```
### Object names
Variable and function names should be lowercase. Use an underscore (`_`) to separate words within a name (reserve `.` for S3 methods). Camel case is a legitimate alternative, but be consistent! Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).
```{r, eval = FALSE}
# Good
day_one
day_1
# Bad
first_day_of_the_month
DayOne
dayone
djm1
```
Where possible, avoid using names of existing functions and variables. This will cause confusion for the readers of your code.
```{r, eval = FALSE}
# Bad
T <- FALSE
c <- 10
mean <- function(x) sum(x)
```
### Spacing
Place spaces around all infix operators (`=`, `+`, `-`, `<-`, etc.). The same rule applies when using `=` in function calls. Always put a space after a comma, and never before (just like in regular English).
```{r, eval = FALSE}
# Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
# Bad
average<-mean(feet/12+inches,na.rm=TRUE)
```
There's a small exception to this rule: `:`, `::` and `:::` don't need spaces around them. (If you haven't seen `::` or `:::` before, don't worry - you'll learn all about them in [namespaces](#namespace).)
```{r, eval = FALSE}
# Good
x <- 1:10
base::get
# Bad
x <- 1 : 10
base :: get
```
Place a space before left parentheses, except in a function call.
```{r, eval = FALSE}
# Good
if (debug) do(x)
plot(x, y)
# Bad
if(debug)do(x)
plot (x, y)
```
Extra spacing (i.e., more than one space in a row) is ok if it improves alignment of equal signs or assignments (`<-`).
```{r, eval = FALSE}
list(
total = a + b + c,
mean = (a + b + c) / n
)
```
Do not place spaces around code in parentheses or square brackets (unless there's a comma, in which case see above).
```{r, eval = FALSE}
# Good
if (debug) do(x)
diamonds[5, ]
# Bad
if ( debug ) do(x) # No spaces around debug
x[1,] # Needs a space after the comma
x[1 ,] # Space goes after comma not before
```
### Curly braces
An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it's followed by `else`.
Always indent the code inside curly braces.
```{r, eval = FALSE}
# Good
if (y < 0 && debug) {
message("Y is negative")
}
if (y == 0) {
log(x)
} else {
y ^ x
}
# Bad
if (y < 0 && debug)
message("Y is negative")
if (y == 0) {
log(x)
}
else {
y ^ x
}
```
It's ok to leave very short statements on the same line:
```{r, eval = FALSE}
if (y < 0 && debug) message("Y is negative")
```
### Line length
Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
### Indentation
When indenting your code, use two spaces. Never use tabs or mix tabs and spaces. Change these options in the code preferences pane:
```{r, echo = FALSE}
bookdown::embed_png("screenshots/style-options.png", dpi = 220)
```
The only exception is if a function definition runs over multiple lines. In that case, indent the second line to where the definition starts:
```{r, eval = FALSE}
long_function_name <- function(a = "a long argument",
b = "another argument",
c = "another long argument") {
# As usual code is indented by two spaces.
}
```
### Assignment
Use `<-`, not `=`, for assignment.
```{r}
# Good
x <- 5
# Bad
x = 5
```
### Commenting guidelines
Comment your code. Each line of a comment should begin with the comment symbol and a single space: `# `. Comments should explain the why, not the what. \index{comments}
Use commented lines of `-` and `=` to break up your file into easily readable chunks.
```{r, eval = FALSE}
# Load data ---------------------------
# Plot data ---------------------------
```
## Differences between functions in scripts and in packages {#r-differences}
Code in a package should not have side effects. Your code should only create objects (mostly functions), and you should not call functions that affect the global state. This means:
* __Don't use `library()` or `require()`__. Use the [DESCRIPTION](description.html)
to specify your package's requirements.
* __Never use `source()`__ to load code from a file. Rely on
`devtools::load_all()` to automatically source all files in `R/`.
* __Don't modify global `options()` or graphics `par()`__. Put state changing
operations in functions that the user can call when they want.
* __Don't save files to disk with `write()`, `write.csv()`, or `saveRDS()`__.
Use [data/](data.html) to cache important data files.
There are two reasons to avoid side-effects. The first reason is pragmatic: while functions with side-effects will work while you're developing a package locally with `devtools::load_all()`, they won't work when you're using a package. This is because your R code is only run once when the package is built, and not every time it's loaded. The second reason is principled: you shouldn't change global states behind your users' backs.
### When you __do__ need side-effects
Occasionally, packages do need side-effects. This is most common if your package talks to an external system --- you might need to do some initial setup when the package loads. To do that, you can use two special functions: `.onLoad()` and `.onAttach()`. These are called when the package is loaded and attached. You'll learn about the distinction between the two in [Namespaces](#namespace). For now, you should always use `.onLoad()` unless explicitly directed otherwise.
Some common uses of `.onLoad()` and `.onAttach()` are:
* To dynamically load a compiled DLL. In most cases, you won't need to
use `.onLoad()` to do this. Instead, you'll use a special namespace
construct; see [namespaces](#namespace) for details.
* To display an informative message when the package loads. This might make
usage conditions clear, or display useful tips. Startup messages is one
place where you should use `.onAttach()` instead of `.onLoad()`. To display
startup messages, always use `packageStartupMessage()`, and not `message()`.
(This allows `suppressPackageStartupMessages()` to selectively suppress
package startup messages).
```{r, eval = FALSE}
.onAttach <- function(libname, pkgname) {
packageStartupMessage("Welcome to my package")
}
```
* To connect R to another programming language. For example, if you use rJava
to talk to a `.jar` file, you need to call `rJava::.jpackage()`. To
make C++ classes available as reference classes in R with Rcpp modules,
you call `Rcpp::loadRcppModules()`.
* To register vignette engines with `tools::vignetteEngine()`.
* To set custom options for your package with `options()`. To avoid conflicts
with other packages, ensure that you prefix option names with the name
of your package. Also be careful not to override options that the user
has already set.
I use the following code in devtools to set up useful options:
```{r, eval = FALSE}
.onLoad <- function(libname, pkgname) {
op <- options()
op.devtools <- list(
devtools.path = "~/R-dev",
devtools.install.args = "",
devtools.name = "Your name goes here",
devtools.desc.author = '"First Last <[email protected]> [aut, cre]"',
devtools.desc.license = "What license is it under?",
devtools.desc.suggests = NULL,
devtools.desc = list()
)
toset <- !(names(op.devtools) %in% names(op))
if(any(toset)) options(op.devtools[toset])
invisible()
}
```
As you can see in the examples, `.onLoad()` and `.onAttach()` are called with two arguments: `libname` and `pkgname`. They're rarely used (they're a holdover from the days when you needed to use `library.dynam()` to load compiled code). They give the path where the package is installed (the "library"), and the name of the package.
If you use `.onLoad()`, consider using `.onUnload()` to clean up any side effects. By convention, `.onLoad()` and friends are usually saved in a file called `zzz.R`. (Note that `.First.lib()` and `.Last.lib()` are old versions of `.onLoad()` and `.onUnload()` and should no longer be used.)
### S4 classes, generics and methods
Another type of side-effect is defining S4 classes, methods and generics. R packages capture these side-effects so they can be replayed when the package is loaded, but they need to be called in the right order. For example, before you can define a method, you must have defined both the generic and the class. This requires that the R files be sourced in a specific order. This order is controlled by the `Collate` field in the `DESCRIPTION`. This is described in more detail in [documenting S4](#man-s4).
## CRAN notes {#r-cran}
(Each chapter will finish with some hints for submitting your package to CRAN. If you don't plan on submitting your package to CRAN, feel free to ignore them!)
If you're planning on submitting your package to CRAN, you must use only ASCII characters in your `.R` files. You can still include unicode characters in strings, but you need to use the special unicode escape `"\u1234"` format. The easiest way to do that is to use `stringi::stri_escape_unicode()`:
```{r}
x <- "This is a bullet •"
y <- "This is a bullet \u2022"
identical(x, y)
cat(stringi::stri_escape_unicode(x))
```
Your R directory should not include any files other than R code. Subdirectories will be silently ignored.