Skip to content

Implementing sample_n and sample_frac #8

@ColinFay

Description

@ColinFay

We could implement a chunk wise sample_n / sample_frac with:

library(tidyverse)
big <- rerun(1000, iris) %>% bind_rows()
path <- tempfile()
write_csv(big, path)

library(chunked)
sample_n.chunkwise <- function(.data, size){
  cmd <- lazyeval::lazy(sample_n(.data, size))
  chunked:::record(.data, cmd)
}

read_csv_chunkwise(path) %>% 
  sample_n(1) %>% 
  collect() 

The sample would be done in each chunk that way.

What do you think about that?
If it sounds like a good idea, let me know and I'll send you a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions