Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_ungroup() to invert vec_group_loc()? #1851

Open
DavisVaughan opened this issue Jun 23, 2023 · 1 comment
Open

list_ungroup() to invert vec_group_loc()? #1851

DavisVaughan opened this issue Jun 23, 2023 · 1 comment

Comments

@DavisVaughan
Copy link
Member

It is currently slightly awkward to "undo" a vec_group_loc() or vec_split() call. You can do it with list_unchop(), but it requires splitting the key first.

library(vctrs)
set.seed(123)

x <- sample(1:4, size = 20, replace = TRUE)
x
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3

# Or `vec_split()` potentially, if we are splitting by something else
locs <- vec_group_loc(x)
locs
#>   key                       loc
#> 1   3 1, 2, 3, 5, 9, 16, 19, 20
#> 2   2    4, 6, 7, 8, 12, 13, 15
#> 3   1                10, 14, 18
#> 4   4                    11, 17

# Chopping here is awkward
list_unchop(vec_chop(locs$key), indices = locs$loc)
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3

We could reintroduce vec_unchop(<vector>, <list-of-indices>) to do this, but I think the "missing piece" is really a way to flatten out that loc column from a list of location vectors that point into the original x to a single location vector that points into the new key.

# Should be fairly fast to build this at the C level?
# Probably some checks on `x` to make sure every element is an integer vector
# and that no element exceeds `sum(list_sizes(x))`. May also want to remove
# `0` values ahead of time?
list_ungroup <- function(x) {
  out <- vec_init(integer(), n = sum(list_sizes(x)))
  
  for (i in seq_along(x)) {
    out <- vec_assign(out, x[[i]], i)
  }
  
  out
}

list_ungroup(locs$loc)
#>  [1] 1 1 1 2 1 2 2 2 1 3 4 2 2 3 2 1 4 3 1 1

vec_slice(locs$key, list_ungroup(locs$loc))
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3
@orgadish
Copy link

orgadish commented Oct 8, 2023

@DavisVaughan list_ungroup seems very specifically about reversing vec_group_loc. What if instead of trying to reverse vec_group_loc with a new function, a third column was built into the result of vec_group_loc which could be flattened directly (e.g. id to match the terminology of vec_group_id)? (This would also resolve what I was looking for in #1857).

I don't think there's a sensible way to include vec_group_id in the vec_group_loc data frame since the structure is inherently different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants