Add Isolated Replicas feature to download index updates from remote backend #875
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds the isolated replicas feature. When enabled, the replica downloads index updates directly from the remote backend (S3). This feature is only usable if the index is started without a connection to the primary.
Config
enabled
: if the feature is enabled, default: falsepollingIntervalSeconds
: how long to wait before checking for a new index version, default: 120CopyJobManager
The
CopyJobManager
interface was added to extract the gRPC specific logic out of theNrtReplicaNode
. The existing logic was moved into theGrpcCopyJobManager
and theRemoteCopyJobManager
was created for copying from the remote backend. Methods:start
: called at the end ofNrtReplicaNode
startnewCopyJob
: create copy job for use by Lucene nrt replication systemfinishNRTCopy
: called after a copy job finishesNrtDataManager
This class now manages starting a file download from the remote backend. It also keeps track of the currently active nrt point and point timestamp. When a copy job finishes successfully, the point state and timestamp are updated.
InputStreamDataInput
The base Lucene file copy api uses the
DataInput
interface. TheInputStreamDataInput
wraps the fileInputStream
so it can be used directly.RemoteBackend
The remote backend is now able to download an index file directly, presenting it as an
InputStream
. Getting the point state from the backend now also returns the timestamp associated with that point.CopyOneFile
This is a Lucene class that we are redefining and shadowing with the classpath. Since the definition was specific to gRPC, I changed this class into an interface. The previous implementation is now
GrpcCopyOneFile
. A new implementation has been added calledStreamCopyOneFile
. This is closer to the stock implementation and hopefully we can unhack this code at some point.Note: many of the tests were generated by AI