New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential incorrect result for duplicate key in MultiGet #12295
Conversation
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
A high-level question @anand1976 Lines 665 to 666 in 377eee7
So if that expectation is agreed upon, should we stop and return when first duplicate encounters SST read error? Right now, the code seems like a workaround by instead ensuring the rest duplicate has the correct key |
@@ -688,6 +689,12 @@ class FilePickerMultiGet { | |||
user_comparator_->CompareWithoutTimestamp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I found the purpose of "if (cmp_largest == 0)" hard to understand and the comment does not help much.
My main questions is:
If we agree with "it keeps moving forward until the last key in the batch that falls in that file" as mentioned in https://github.com/facebook/rocksdb/pull/12295/files#diff-6270b3486fea620597e24151dd0d75c2c14b3c4c30d62e0567811723628733deR623-R625, I don't know why can't we merge the case cmp_largest == 0 with cmp_largest < 0.
Particularly, the comment about this case "which means the next key will not be in this file, so stop looking further" seems to assume there aren't any duplicate keys. If we don't plan to change/clarify MultiGet() on that aspect, I don't know why we still keep this special path. Also, should we still allow MultiGet() to take duplicate keys for one CF? Duplicate keys makes the logic harder to reason.
For the above question, I was once wondering if it's because we need to, as the comment mentioned, "..leave batch_iter as is since we may have to pick up from there for the next file, if this file has a merge value rather than". But again, I don't understand we didn't do the same ("leave batch_iter_ as is ") for the case where the duplicate keys lie in the middle of the file instead of the last.
The other condition needed for the bug is first duplicate key (a) marked as done while SST look returning error, which then (b) leads to rest of the duplicate keys being skipped https://github.com/facebook/rocksdb/blob/main/db/version_set.cc?fbclid=IwAR1r2JRRztP9rXZqdzugSsXKbOg0dQx0waIeD0yQ5gYCc6xCldRWtPVSWvA#L445-L448 For (a), why didn't we surface this error all the way to user? For (b), skipping the rest of the duplicate keys doesn't sound aligned with the API description here Lines 665 to 666 in 377eee7
Should we do something about this condition too? |
@hx235 There's no contract to treat duplicate keys in a special manner. The way I read it, we won't de-duplicate the keys, i.e we will not attempt to return identical result for each of the duplicates. I feel its more trouble than its worth to try to ensure we return an identical status and value. I'd rather just make it explicit in the comment that different status may be returned depending on what happens while processing. |
We do. Its marked as done in the batch, and the status is set to the IO error.
See previous comment on this. |
e187c78
to
ee2e022
Compare
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I still wonder if there is anything we could do about https://github.com/facebook/rocksdb/pull/12295/files#r1472259193 maybe as a follow up/tech debt
You mean try to simplify the |
I meant whether we we can not have |
@anand1976 merged this pull request in 95b41ee. |
The RocksDB correctness testing has recently discovered a possible, but very unlikely, correctness issue with MultiGet. The issue happens when all of the below conditions are met -
The problem is, in FilePickerMultiGet, when looking up keys in a level we use FileIndexer and the overlapping file in the current level to determine the search bounds for that key in the file list in the next level. If the next level is empty, the search bounds are reset and we do a full binary search in the next non-empty level's LevelFilesBrief. However, under the conditions #1 and #2 listed above, only the first of the duplicates has its next-level search bounds updated, and the remaining duplicates are skipped.
Test plan:
Add unit tests that fail an assertion or return wrong result without the fix