fix(buffers): fix panic in disk buffer when dealing with corrupted file #23617
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
at startup writer checks the last written file and if it is corrupted it sets a flag to skip to next file but current write file id in ledger is not update and file is not created.
when reader hit this corrupted file it roll over to next file to read and updates readable file id. at this point reader file id can be greater than writer file id if reader was done with last file. from
seek_to_next_record
in readerat this point both reader, writer and ledger is initialized and expectation would be that any read should block until writer writes new file and data.
but any read at this point keep increasing next file id to read and loop over to current file which was already read. this cause panic at various places.
fix is to wait for writer to create file if reader_file id is current or next writer file id. writer might end up creating anyone of them based on skip flag.
sequence of logs may also explain what happens when we hit this issue.
reader hit bad file and rollover to next file.
a subsequent read goes through this loop and keep incrementing reader file id until wraps around and opens same file again for reading. this creates condition where new record id is lower than previous record id and gives impression of huge records being skipped also total buffer in legder will also be decreased and may hit less than 0 causing panic.
Vector configuration
How did you test this PR?
added unit test which reproduce panic if fix is removed.
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changelog
label to this PR.References
Notes
@vectordotdev/vector
to reach out to us regarding this PR.pre-push
hook, please see this template.cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)git merge origin master
andgit push
.Cargo.lock
), pleaserun
cargo vdev build licenses
to regenerate the license inventory and commit the changes (if any). More details here.