Skip to content

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Sep 20, 2025

Context/Summary:

This is stacked on top of #13928.

Flow of resuming: DB::OpenAndCompact() -> Compaction progress file -> SubcompactionProgress -> CompactionJob
Flow of persistence: CompactionJob -> SubcompactionProgress -> Compaction progress file -> DB that is called with OpenAndCompact()

This PR focuses on SubcompactionProgress -> CompactionJob and CompactionJob -> SubcompactionProgress -> Compaction progress file. For now only single subcompaction is supported as OpenAndCompact() does not partition compaction anyway.

The actual triggering of progress persistence and resuming (i.e, integration) is through DB::OpenAndCompact() in the upcoming PR.

Resume Flow

  1. input_iter->Seek(next_internal_key_to_compact) // Position iterator
  2. ReadTableProperties() // Validate existing outputs
  3. RestoreCompactionOutputs() in CompactionOutputs // Rebuild output file metadata
  4. Restore critical statistics about processed input and output records count for verification later
  5. AdvanceFileNumbers() // Prevent file number conflicts
  6. Continue normal compaction from positioned iterator or fallback to not resuming compaction in limited case or fail the compaction entirely

Persistence Strategy

  1. When: At each SST file completion (FinishCompactionOutputFile()). This is the simplest but most expensive frequency. See below for benchmarking and potential follow-up items
  2. What: Serialize, write and sync the in-memory SubcompactionProgress to a dedicated manifest-like file
  3. For simplicity: Only persist at "clean" boundaries (no overlapping user keys, no range deletions, no timestamp for now)

Test plan:

  • New unit test in CompactionJob level to cover basic compaction progress resumption
  • Existing UTs and stress/crash test

@meta-cla meta-cla bot added the CLA Signed label Sep 20, 2025
@hx235 hx235 force-pushed the resumable_compaction_job_change branch 2 times, most recently from 214b20a to 359598f Compare September 20, 2025 10:51
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this in D82889188.

@hx235 hx235 changed the title Resuming and tracking compaction progress in CompactionJob Resuming and persisting compaction progress in CompactionJob Sep 20, 2025
@hx235 hx235 changed the title Resuming and persisting compaction progress in CompactionJob Resuming and persisting subcompaction progress in CompactionJob Sep 20, 2025
@hx235 hx235 requested review from jaykorean and cbi42 September 20, 2025 22:53
@hx235
Copy link
Contributor Author

hx235 commented Sep 20, 2025

@cbi42 - only if you have time, can you take a look at compaction_job.cc change in commit 359598f (this PR without stacking)?

}

// LIMITATION: Persisting compaction progress with timestamp
// is not supported since the feature of persisting tiemstamp of the key in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tiemstamp --> timestamp

assert(input_iter);

Status status =
MaybeResumeSubcompactionProgressOnInputIterator(sub_compact, input_iter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an option to not use this feature?

Copy link
Contributor Author

@hx235 hx235 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - it's in MaybeResumeSubcompactionProgressOnInputIterator where there is no progress. The actual option is in OpenAndCompact option and if false, will pass in no progress in the third PR here c6d04f5#diff-17fbdec07244b1f07d1a4e5aed0a6feecf4474d20b3129818c10fc0ff9f3d547R1400

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more concerned about the scenario in case we find any issue when this PR makes to Prod before your third PR.

The only way to mitigate it would be reverting this PR. I was wondering if we could completely put from line 1725 to 1734 under a db or cf-wide check - resumable_compaction_enabled or something like that.

And if the OpenAndCompactOption::resume_compaction is true while this db or cf-wide check is false, we either ignore it or return Status::Unsupported()

if (status.IsIncomplete()) {
input_iter->SeekToFirst();
} else if (!status.ok()) {
sub_compact->status = status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had any IO issue while resuming the compaction or even tmp output files are corrupted for any reason, I'm wondering if it's better to fail the current compaction with Corruption(), or still want to re-try from the beginning.

Copy link
Contributor Author

@hx235 hx235 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - how to set a good contract and specify in the OpenAndCompacition() API when not to retry. I haven't written that yet in my third PR and want to wait until we chat tmr. My take is we only resume when OpenAndCompact() crashes, returns Shutdown or ManualCompactionPaused but no other status. The Shutdown and ManualCompactionPaused after this #13891 should not override other non-ok status so should be good to rely on. These two error status are also rare to expand their use case within RocksDB.

@hx235 hx235 force-pushed the resumable_compaction_job_change branch from 359598f to 815d7f3 Compare September 23, 2025 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants