Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tikv crash in compaction filter casued by IoError: No Such file or directory: While open a file for random read: /data/xxx/yyy/zzz/115748.blob #328

Open
guoxiangCN opened this issue Sep 2, 2024 · 8 comments

Comments

@guoxiangCN
Copy link
Contributor

This error is different with the existing older Missing blob.

@guoxiangCN
Copy link
Contributor Author

guoxiangCN commented Sep 2, 2024

Status BlobStorage::Get(const ReadOptions& options, const BlobIndex& index,
                        BlobRecord* record, PinnableSlice* buffer) {
  auto sfile = FindFile(index.file_number).lock();
  if (!sfile)
    return Status::Corruption("Missing blob file: " +
                              std::to_string(index.file_number));
  
// NOTE-1: the purge obselete file thread can delete the file in this time, and the next line will report the error
  
  return file_cache_->Get(options, sfile->file_number(), sfile->file_size(),
                          index.blob_handle, record, buffer);
}

@guoxiangCN
Copy link
Contributor Author

I have checked the code here and there is indeed a race condition present

@guoxiangCN
Copy link
Contributor Author

@v01dstar Hello, can you help confirm

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

At first glance, seems possible, allow me dig more.

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

I think this is indeed a problem, unless we set skip_value_in_compaction_filter to be true, however, we don't. I am surprise that we don't see this error in our users' environment. If I didn't miss anything, this is more than a race condition. Since compaction filter does not go through the normal read path (i.e. read with a snapshot), this should happen quite frequently.

@guoxiangCN
Copy link
Contributor Author

guoxiangCN commented Sep 6, 2024

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur.
This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

@v01dstar
Copy link
Contributor

v01dstar commented Sep 6, 2024

I guess that in the TIDB environment, Tikv only uses Compaction Filter in WriteCF, while WriteCF only saves some transaction commit information and small values less than 256 bytes. Moreover, by default, WriteCF does not enable Titan, so it will not occur. This issue occurs in scenarios where Tikv is used with Rawkv or directly with Titan.

Yes, I totally missed that. I guess, you can leverate skip_value_in_compaction_filter in this case. Or you can propose a simple fix, which as you suggested, and also mentioned in the TODO, i.e. return corresponding error to the caller of Get(), and the caller (compaction filter) decide what to do.

@guoxiangCN
Copy link
Contributor Author

i try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants