-
-
Notifications
You must be signed in to change notification settings - Fork 809
Description
Describe the bug
The IndexWriter::committed_opstamp
field is initialized during IndexWriter
creation but is never updated after successful commits. This causes severe issues with rollback functionality, delete operations, and can lead to data corruption through duplicate opstamps.
Which version of tantivy are you using?
both main and v0.22.0
Current Behavior
committed_opstamp
is initialized from disk metadata when creating anIndexWriter
- After each successful commit,
committed_opstamp
remains at its initial value (0 when using the fresh index) - The public method
commit_opstamp()
always returns the initial value, not the actual last committed opstamp
Expected Behavior
According to the documentation, commit_opstamp()
should return "the opstamp of the last successful commit". The committed_opstamp
field should be updated after each successful commit.
Reproduction
use tantivy::schema::{Schema, TEXT};
use tantivy::{doc, Index, IndexWriter};
fn main() -> tantivy::Result<()> {
let mut schema_builder = Schema::builder();
let text_field = schema_builder.add_text_field("text", TEXT);
let index = Index::create_in_ram(schema_builder.build());
let mut index_writer: IndexWriter = index.writer_with_num_threads(1, 10_000_000)?;
// Initial committed_opstamp is 0
assert_eq!(index_writer.commit_opstamp(), 0);
// Add document and commit
let opstamp1 = index_writer.add_document(doc!(text_field => "first"))?;
let commit_result = index_writer.commit()?;
// BUG: committed_opstamp should be updated to opstamp1, but it's still 0
assert_eq!(index_writer.commit_opstamp(), 0); // Should be opstamp1!
// This breaks rollback functionality
index_writer.add_document(doc!(text_field => "second"))?;
index_writer.rollback()?; // This reverts stamper to 0 instead of opstamp1!
// Now we get duplicate opstamps!
let opstamp3 = index_writer.add_document(doc!(text_field => "third"))?;
assert_eq!(opstamp3, 1); // Same as opstamp1 - DUPLICATE!
Ok(())
}
Impact
This bug causes multiple severe issues:
1. Broken Rollback Mechanism
rollback()
callsself.stamper.revert(self.committed_opstamp)
- Since
committed_opstamp
is never updated, it always reverts to 0 - This causes duplicate opstamps and violates monotonicity guarantees
2. Incorrect delete_all_documents()
- Also uses
self.stamper.revert(self.committed_opstamp)
- Returns wrong opstamp value to the caller
3. Data Integrity Issues
- After rollback, new documents can get opstamps that already exist in committed segments
- Violates the fundamental assumption that opstamps are monotonically increasing
- Can lead to incorrect document ordering and deletion behavior
4. Misleading Public API
commit_opstamp()
returns incorrect values- Any user code relying on this method for synchronization is broken
Root Cause
The issue stems from having two separate opstamp tracking mechanisms that are not synchronized:
-
SegmentUpdater correctly maintains the committed opstamp:
- In
schedule_commit()
, it saves the opstamp to disk viasave_metas(opstamp, payload)
- It updates its internal
active_index_meta
with the correct opstamp - This is what gets persisted to disk and loaded on restart
- In
-
IndexWriter has its own
committed_opstamp
field:- Initialized from disk metadata when creating the IndexWriter (line 312)
- Never updated after successful commits
- Used by
rollback()
,delete_all_documents()
, and the publiccommit_opstamp()
method
The commit flow in index_writer.rs
:
commit()
→prepare_commit()?.commit()
PreparedCommit::commit()
→segment_updater.schedule_commit(opstamp)
- SegmentUpdater correctly updates its metadata, but IndexWriter's
committed_opstamp
field remains stale
Suggested Fix
The committed_opstamp
field should be updated after a successful commit. Options include:
- Update after commit completes: When
schedule_commit
returns successfully, updateIndexWriter::committed_opstamp
- Query SegmentUpdater: Have
commit_opstamp()
query the current opstamp fromsegment_updater.load_meta().opstamp
instead of using a stale field - Remove the field: Since SegmentUpdater already tracks this correctly, consider removing the redundant field from IndexWriter
The key is ensuring that rollback and delete operations use the actual last committed opstamp, not a stale value from initialization.
Additional Notes
This bug has likely existed for a long time but may have gone unnoticed because:
- Many users don't use rollback functionality
- The effects might not be immediately visible
- The public
commit_opstamp()
method might not be widely used
However, for applications that rely on rollback or the opstamp values for consistency, this is a critical data integrity issue.