OAK-11154: Read partial segments from SegmentWriter#1746
Open
Nicolapps wants to merge 1 commit intoapache:trunkfrom
Open
OAK-11154: Read partial segments from SegmentWriter#1746Nicolapps wants to merge 1 commit intoapache:trunkfrom
Nicolapps wants to merge 1 commit intoapache:trunkfrom
Conversation
67c2790 to
a090aab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request modifies the
SegmentWriterinterface in oak-segment-tar to add the possibility of reading the state of a segment currently being written to, as described in OAK-11154.Closes OAK-11154
Why?
oak-segment-tar writes new segments using an implementation of
SegmentWriter.Since segments are immutable, the state of a segment that hasn’t been flushed yet isn’t visible outside of the
SegmentWriterinstance. However, in some cases, code usingSegmentWritermight want to access the partial segment data.Currently, the only possible way to do it is to call
flush, which will force the segment to be flushed right away, and then get the full segment from the underlying segment store. This is bad for performance, because we need to do more flushes that necessary, and because there’s a risk of creating a lot of segments that have a size much smaller thanMAX_SEGMENT_SIZE.To avoid this, I suggest that we add a
readPartialSegmentStatemethod toSegmentWriter, which takes the segment ID of an unflushed segment and returns it if possible.Backwards-compatibility
This change is backwards-compatible with existing users of
SegmentWriter(because they’re not using the new method). The new method comes with a default implementation which throws anUnsupportedOperationException.Concurrency
Previously, the class was marked as not thread-safe, which made sense since it was only expected that a single writer thread uses it at the same time (concurrent calls wouldn’t have made sense since the order in which
prepareandwriteXYZmethods are called matters).One major change with
SegmentBufferWriteris that itsreadPartialSegmentStatemethod can now be called concurrently with the other methods in the same class. To support this, we now usesynchronizedon the methods that are accessible publicly. This shouldn’t cause a drop in performance, because most calls to the class are on the writer thread (so not concurrent between themselves), and it is expected fromreadPartialSegmentStateto be called rarely (compared to the other methods).I could confirm that there is no noticeable drop in performance by running the write benchmarks without and with the change, and observed no difference:
Without
synchronizedWith
synchronizedTesting
The PR adds a new test,
readPartialSegmentState, which covers the implementation of the method inSegmentBufferWriter.