-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
We could implement a chunk wise sample_n / sample_frac with:
library(tidyverse)
big <- rerun(1000, iris) %>% bind_rows()
path <- tempfile()
write_csv(big, path)
library(chunked)
sample_n.chunkwise <- function(.data, size){
cmd <- lazyeval::lazy(sample_n(.data, size))
chunked:::record(.data, cmd)
}
read_csv_chunkwise(path) %>%
sample_n(1) %>%
collect()
The sample would be done in each chunk that way.
What do you think about that?
If it sounds like a good idea, let me know and I'll send you a PR.
Metadata
Metadata
Assignees
Labels
No labels