-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior in .BY
#5389
Comments
Seems more to be an issue of the collecting procedure, since printing the values works as expected. library(data.table)
dt = data.table(l = sample(letters[1:5], 100, replace = TRUE), b = rnorm(100))
setkey(dt, l)
dt[
, m := {
cat("BY:", as.character(.BY), "\n")
cat("GRP:", as.character(.GRP), "\n")
mean(b)
}
, by = l
]
#> BY: a
#> GRP: 1
#> BY: b
#> GRP: 2
#> BY: c
#> GRP: 3
#> BY: d
#> GRP: 4
#> BY: e
#> GRP: 5 |
It seems that library(data.table)
dt = data.table(l = sample(letters[1:5], 100, replace = TRUE), b = rnorm(100))
setkey(dt, l)
byg = list()
grp = list()
dt[
, m := {
byg <<- append(byg, force(unlist(.BY)))
grp <<- append(grp, .GRP)
mean(b)
}
, by = l
]
str(byg)
#> List of 5
#> $ l: chr "a"
#> $ l: chr "b"
#> $ l: chr "c"
#> $ l: chr "d"
#> $ l: chr "e"
str(grp)
#> List of 5
#> $ : int 1
#> $ : int 2
#> $ : int 3
#> $ : int 4
#> $ : int 5 |
Thanks for your comments @ben-schwen . When using Is this expected? Any ideas for just returning the library(data.table)
dt = data.table(
l = sample(letters[1:2], 100, replace = TRUE)
, n = sample(1L:2L, 100, replace = TRUE)
, b = rnorm(100)
)
keys = c("l", "n")
setkeyv(dt, keys)
by_list = list()
by_unlist = list()
by_flatten = list()
dt[
, m := {
by_list <<- append(by_list, list(.BY))
by_unlist <<- append(by_unlist, list(unlist(.BY)))
by_flatten <<- append(by_flatten, list(purrr::flatten(.BY)))
mean(b)
}
, by = keys
]
str(by_list)
#> List of 4
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 2
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 2
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 2
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 2
str(by_unlist)
#> List of 4
#> $ : Named chr [1:2] "a" "1"
#> ..- attr(*, "names")= chr [1:2] "l" "n"
#> $ : Named chr [1:2] "a" "2"
#> ..- attr(*, "names")= chr [1:2] "l" "n"
#> $ : Named chr [1:2] "b" "1"
#> ..- attr(*, "names")= chr [1:2] "l" "n"
#> $ : Named chr [1:2] "b" "2"
#> ..- attr(*, "names")= chr [1:2] "l" "n"
str(by_flatten)
#> List of 4
#> $ :List of 2
#> ..$ l: chr "a"
#> ..$ n: int 1
#> $ :List of 2
#> ..$ l: chr "a"
#> ..$ n: int 2
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 1
#> $ :List of 2
#> ..$ l: chr "b"
#> ..$ n: int 2 Created on 2022-05-27 by the reprex package (v2.0.1) |
Please let me know, if you think this should better discussed on Stackoverflow. Thanks |
If you take a look at the source of Lines 91 to 104 in e9a323d
you can see that BY is always directly assigned and overwritten for each group, so with assigning it to a list, you always assign the same memory pointer. This is also confirmed by the following. library(data.table)
dt = data.table(l = sample(letters[1:5], 100, replace = TRUE), b = rnorm(100))
setkey(dt, l)
by_list = list()
# you should preallocate in R
# by_list = vector(mode="list", length(unique(dt$l)))
dt[
, m := {
by_list <<- append(by_list, copy(.BY))
cat(address(.BY), "\n")
mean(b)
}
, by = l
]
#> 0x56343a667930
#> 0x56343a667930
#> 0x56343a667930
#> 0x56343a667930
#> 0x56343a667930
by_list
#> $l
#> [1] "a"
#>
#> $l
#> [1] "b"
#>
#> $l
#> [1] "c"
#>
#> $l
#> [1] "d"
#>
#> $l
#> [1] "e" You can achieve your wanted behavior with using The working assignment without copying probably depends on whether R is internally copying or not. In previous R versions, it would always copy but R-core improved this a lot in the last years. Anyway, we should clarify this in the documentation. So I guess here is the right place to discuss it. |
Hi,
we are trying to extract the
.BY
information to a list object outside the current data.table. Unfortunately, we get unexpected results.All elements of the list of length(by groups) contain the same value - the .BY information from the last group.
When using .GRP instead, the list is populated with different values - the correct group indices.
Is this expected behavior, and how can I get a list of correct by-group values be extracted?
Thanks!
Created on 2022-05-23 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: