Dask bag

I start this issue to get some community insight/help on this. As I understand it starting with a `Dask bag` is the most natural step to start this.

I understand this as a partitioned data structure on which we can apply functional combinators like `map`, `filter`, `fold`, etc. Given the memory layout from polars, it seems most natural to have columnar or batches of columnar memory in a partitioned datastructure. Rows are very, very inefficient.

This might be a context switch as spark RDDs seem to work on rows, and pandas also has somewhat row oriented manager with `C` ordering in numpy + the row manager.

Things that also might be interesting is partitioning the lazy polars queries and have no data at all. Although this is also something the dask graph does already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask bag #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dask bag #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions