Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume remote compaction aborted due to primary restart #12177

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Dec 23, 2023

Context:
If the primary db is restarted after requesting a remote compaction but before installing the compaction, the same compaction will be scheduled and requested like a new compaction again. Therefore, the compaction progress made in the remote site will be wasted.

Summary:
This PR allows the restarted primary db wait for the remote compaction to return from the remote site instead of rescheduling a same new one. At the high level, we persist essential compaction information in the manifest to wait for the corresponding remote compaction. So upon restart, we can reconstruct the memory state to wait for the remote compaction and prevent compaction conflict from other new compaction after restart.

Test:

  • New UT TEST_F(CompactionServiceResumableCompactionTest, ResumableCompaction)
  • Add Options::resume_compaction to crash test to ensure it has no impact on existing feature when remote compaction is not used.
    • Crash test currently does not use remote compaction.
  • Run stress test patch with dummy remote service to test upgrade/downgrade compatibly on manifest

Limitations:

  • Failed remote compaction will also be resumed on next db open in addition to the unfinished ones due to primary db restart (noted in API)
  • Failed resumed remote compaction will not fall back to local in any case since we now only persist minimum compaction information which can be less than local compaction needs.
  • Compaction are only resumed once. If the resume fail, we will not resume it again and the failure is like any other compaction failure.
  • If a second reopen happens quickly after the first reopen but before the first reopen is able to finish resuming the compaction, then the second reopen may not be able to resume the compaction. That's because the compaction resumed in the 1st db open is not persisted to manifest and the manifest with that info can be deleted before the 2nd DB reopen.

@hx235 hx235 changed the title Resume remote compaction aborted due to primary restart or failure Resume remote compaction aborted due to primary restart Dec 23, 2023
@hx235 hx235 force-pushed the resume_remote_compaction branch 2 times, most recently from c82c2cc to 7b881cd Compare December 23, 2023 05:40
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -0,0 +1 @@
Provide an experimental option `Options::resume_compaction` to resume unfinished compactions left from the last db session. Right now only unfinished remote compactions due to primary db restart or failed remote compaction are supported. This options is turned on by default and has no effect to users with no remote compaction (i.e, `Options::compaction_service == nullptr`) or disable auto compaction (i.e, `Options::disable_auto_compactions = true`)
Copy link
Contributor Author

@hx235 hx235 Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor TODO: "... this option"

metadata.clear();
db_->GetLiveFilesMetaData(&metadata);
if (compaction_unfinished_ && resume_compaction) {
ASSERT_LT(metadata.size(), prev_reopen_live_file_num);
Copy link
Contributor Author

@hx235 hx235 Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor TODO: assert sync point is called even manually tracing through debugger shows it is called.

@hx235 hx235 requested a review from ajkr December 25, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants