New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume remote compaction aborted due to primary restart #12177
base: main
Are you sure you want to change the base?
Conversation
c82c2cc
to
7b881cd
Compare
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
7b881cd
to
e669318
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
e669318
to
fde50d9
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -0,0 +1 @@ | |||
Provide an experimental option `Options::resume_compaction` to resume unfinished compactions left from the last db session. Right now only unfinished remote compactions due to primary db restart or failed remote compaction are supported. This options is turned on by default and has no effect to users with no remote compaction (i.e, `Options::compaction_service == nullptr`) or disable auto compaction (i.e, `Options::disable_auto_compactions = true`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor TODO: "... this option"
metadata.clear(); | ||
db_->GetLiveFilesMetaData(&metadata); | ||
if (compaction_unfinished_ && resume_compaction) { | ||
ASSERT_LT(metadata.size(), prev_reopen_live_file_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor TODO: assert sync point is called even manually tracing through debugger shows it is called.
Context:
If the primary db is restarted after requesting a remote compaction but before installing the compaction, the same compaction will be scheduled and requested like a new compaction again. Therefore, the compaction progress made in the remote site will be wasted.
Summary:
This PR allows the restarted primary db wait for the remote compaction to return from the remote site instead of rescheduling a same new one. At the high level, we persist essential compaction information in the manifest to wait for the corresponding remote compaction. So upon restart, we can reconstruct the memory state to wait for the remote compaction and prevent compaction conflict from other new compaction after restart.
Test:
TEST_F(CompactionServiceResumableCompactionTest, ResumableCompaction)
Options::resume_compaction
to crash test to ensure it has no impact on existing feature when remote compaction is not used.Limitations: