Skip to content

Conversation

@nitsanw
Copy link
Contributor

@nitsanw nitsanw commented Sep 29, 2025

patch by Nitsan Wakart; reviewed by TBD for CASSANDRA-TBD

  • EPPv1 - all new code
  • Cursor compaction integration
  • JMH benchmarks for compaction and cursor impls
  • EPPv1 - New tests
  • Existing tests tweaks for new code
  • [revert?] change the default partitioner to expand testing of new code
  • [revert?] test data used some benchmarks
  • [revert?] jmh tweak GC settings for stability
  • [revert?] javadoc typos, marking unused params
  • [revert?] clarifying comment
  • [revert?] toString improvement
  • [revert?] remove spurious keywords
  • [revert?] marking metadata collection
  • [revert?] cursor verifier
  • Exclude SAI and counter column
  • Exclude BTI and legacy versions
  • Temporarily skip very long running test

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

@nitsanw nitsanw changed the title Add cursor based optimized compaction path (WIP) CASSANDRA-20918 Add cursor-based low allocation optimized compaction implementation Sep 29, 2025
@blambov blambov self-requested a review September 30, 2025 07:40
Copy link
Contributor

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be stated that this approach that bundles all the steps of the processing in one single file will be quite difficult to maintain and keep in sync with the combination of iterators and transformations that we use in other parts of the code such as the query path. However, once we have reached a point of stability for a piece of functionality where we do not expect it to change significantly for a long time, it does makes sense to unpack the code and present it in a way that makes its execution as direct as possible, and this patch is a good such representation of the compaction process.

Personally, I am very unhappy about switching to mutable, pooled and reused objects, which are significantly more unwieldy and error prone, especially in contexts where concurrent access can occur. It seems this is becoming a necessity if we need to achieve acceptable performance with the current state of our heap usage, but we still need to very carefully separate the mutable versions of concepts from the immutable ones used throughout the code base. Suddenly making a DeletionTime mutable is not an acceptable change.

First batch of targeted comments below, mainly going over CompactionCursor.java.

Copy link
Contributor

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next batch of comments.

}
if (!UnfilteredSerializer.hasAllColumns(flags))
{
// TODO: re-implement GC free
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataStax's branch has an implementation of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an important optimization, but it should not block this PR IMO.

Copy link
Contributor

@blambov blambov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next batch of comments.

@nitsanw nitsanw force-pushed the compaction-work-pr-prep branch 2 times, most recently from 4a3bed8 to ef0a4ba Compare November 13, 2025 08:18
@nitsanw
Copy link
Contributor Author

nitsanw commented Nov 13, 2025

Squashed fixup commits and rebased on top of latest trunk for clarity

@nitsanw nitsanw requested a review from blambov November 18, 2025 12:01
@nitsanw
Copy link
Contributor Author

nitsanw commented Nov 18, 2025

@blambov I think your concerns have been addressed and OK to proceed?

Looking for a second reviewer

@nitsanw nitsanw force-pushed the compaction-work-pr-prep branch from ef0a4ba to 2243d4f Compare November 24, 2025 06:10
@nitsanw nitsanw force-pushed the compaction-work-pr-prep branch from 8844205 to ebf366d Compare November 25, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants