Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L1 arrow compaction #433

Open
thorfour opened this issue Apr 27, 2023 · 11 comments
Open

L1 arrow compaction #433

thorfour opened this issue Apr 27, 2023 · 11 comments
Labels
planned Planned work wont get closed by stalebot

Comments

@thorfour
Copy link
Contributor

thorfour commented Apr 27, 2023

It may be useful to have the option to compact L0 arrow records into L1 arrow records instead of Parquet.

@thorfour
Copy link
Contributor Author

thorfour commented May 1, 2023

This may only be worth pursuing once the REE support changes are in FrostDB as well as the record sorting implementation apache/arrow#34719 is completed

@asubiotto
Copy link
Member

Agreed. I think moving to arrow-only in-mem would be the last step in this quarter.

@gernest
Copy link
Contributor

gernest commented Dec 18, 2023

I am thinking about this, I was wondering if this is the same as arrowutils.MergeRecords(arrow_parts...) |> arrowutils.SortRecord |> parts.NewArrowPart ?

@asubiotto
Copy link
Member

Yes, although given the arrow parts should be merged on input, there probably isn't a need for the downstream sort. I'd also be interested in getting some L0 to L1 stats on how much memory we reduce through arrow compaction vs parquet compaction.

@gernest
Copy link
Contributor

gernest commented Dec 20, 2023

@asubiotto can you expand a bit about memory expectation between arrow/parquet compaction ?

I was always under the impression parquet+compression gives better memory saving than arrow.

@asubiotto
Copy link
Member

Yes, this is why I'd be interested in getting some numbers so we are informed about the tradeoffs. Intuitively, dictionary encoding should go a long way. We've also been thinking about experimenting with run end encoding in arrow.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 20, 2024
@asubiotto asubiotto removed the Stale label Jan 20, 2024
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Feb 20, 2024
@asubiotto asubiotto removed the Stale label Feb 20, 2024
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Mar 22, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@asubiotto asubiotto reopened this Apr 15, 2024
@asubiotto
Copy link
Member

I think it's still useful to keep this open.

@github-actions github-actions bot removed the Stale label Apr 16, 2024
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label May 16, 2024
@asubiotto asubiotto removed the Stale label May 16, 2024
@thorfour thorfour added the planned Planned work wont get closed by stalebot label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
planned Planned work wont get closed by stalebot
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants