You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/translation-function.Rmd
+16-16Lines changed: 16 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -30,15 +30,15 @@ con <- simulate_dbi()
30
30
translate_sql((x + y) / 2, con = con)
31
31
```
32
32
33
-
`translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translation()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends:
33
+
`translate_sql()` takes an optional `con` parameter. If not supplied, this causes dbplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dbplyr uses `sql_translation()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends:
34
34
35
35
```{r}
36
36
translate_sql(x ^ 2L, con = con)
37
37
translate_sql(x ^ 2L, con = simulate_sqlite())
38
38
translate_sql(x ^ 2L, con = simulate_access())
39
39
```
40
40
41
-
Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide.
41
+
Perfect translation is not possible because databases don't have all the functions that R does. The goal of dbplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and in R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide.
42
42
43
43
If you're interested in how `translate_sql()` is implemented, the basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](https://adv-r.hadley.nz/translation.html).
44
44
@@ -63,7 +63,7 @@ The following examples work through some of the basic differences between R and
63
63
```
64
64
65
65
* R and SQL have different defaults for integers and reals.
66
-
In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real
66
+
In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real.
67
67
68
68
```{r}
69
69
translate_sql(1, con = con)
@@ -104,7 +104,7 @@ dbplyr no longer translates `%/%` because there's no robust cross-database trans
104
104
105
105
### Aggregation
106
106
107
-
All database provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference:
107
+
All databases provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference:
`if` and `switch()` are translated to `CASE WHEN`:
123
123
124
124
```{r}
125
125
translate_sql(if (x > 5) "big" else "small", con = con)
@@ -135,7 +135,7 @@ translate_sql(switch(x, a = 1L, b = 2L, 3L), con = con)
135
135
136
136
## Unknown functions
137
137
138
-
Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can often be used directly via `translate_sql()`.
138
+
Any function that dbplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dbplyr can often be used directly via `translate_sql()`.
139
139
140
140
### Prefix functions
141
141
@@ -145,15 +145,15 @@ Any function that dbplyr doesn't know about will be left as is:
145
145
translate_sql(foofify(x, y), con = con)
146
146
```
147
147
148
-
Because SQL functions are general case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual:
148
+
Because SQL functions are generally case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual:
149
149
150
150
```{r}
151
151
translate_sql(FOOFIFY(x, y), con = con)
152
152
```
153
153
154
154
### Infix functions
155
155
156
-
As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE` which does a limited form of pattern matching:
156
+
As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE`, which does a limited form of pattern matching:
157
157
158
158
```{r}
159
159
translate_sql(x %LIKE% "%foo%", con = con)
@@ -190,7 +190,7 @@ mf %>%
190
190
191
191
### Error for unknown translations
192
192
193
-
If needed, you can also force dbplyr to error if it doesn't know how to translate a function with the `dplyr.strict_sql` option:
193
+
If needed, you can also use the `dplyr.strict_sql` option to force dbplyr to error if it doesn't know how to translate a function:
194
194
195
195
```{r}
196
196
#| error = TRUE
@@ -245,16 +245,16 @@ Things get a little trickier with window functions, because SQL's window functio
245
245
knitr::include_graphics("windows.png", dpi = 300)
246
246
```
247
247
248
-
Of the many possible specifications, there are only three that commonly
248
+
Of the many possible specifications, only three are commonly
249
249
used. They select between aggregation variants:
250
250
251
-
* Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING`
251
+
* Recycled: `BETWEEN UNBOUND PRECEDING AND UNBOUND FOLLOWING`
252
252
253
-
* Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW`
253
+
* Cumulative: `BETWEEN UNBOUND PRECEDING AND CURRENT ROW`
254
254
255
-
* Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING`
255
+
* Rolling: `BETWEEN 2 PRECEDING AND 2 FOLLOWING`
256
256
257
-
dplyr generates the frame clause based on whether your using a recycled
257
+
dbplyr generates the frame clause based on whether you're using a recycled
258
258
aggregate or a cumulative aggregate.
259
259
260
260
To see how individual window functions are translated to SQL, we can again use `translate_sql()`:
@@ -266,14 +266,14 @@ translate_sql(ntile(G, 2), con = con)
266
266
translate_sql(lag(G), con = con)
267
267
```
268
268
269
-
If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()`
269
+
If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()`:
270
270
271
271
```{r}
272
272
translate_sql(cummean(G), vars_order = "year", con = con)
273
273
translate_sql(rank(), vars_group = "ID", con = con)
274
274
```
275
275
276
-
There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using:
276
+
There are some challenges when translating window functions between R and SQL, because dbplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using:
277
277
278
278
* For ranking functions, the ordering variable is the first argument: `rank(x)`,
279
279
`ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated
0 commit comments