KeySegmentsExtractor and prototype higher-dimensional filtering #12075

pdillinger · 2023-11-17T23:53:08Z

Summary: This change contains a prototype new API for "higher dimensional" filtering of read queries. Existing filters treat keys as one-dimensional, either as distinct points (whole key) or as contiguous ranges in comparator order (prefix filters). The proposed KeySegmentsExtractor allows treating keys as multi-dimensional for filtering purposes even though they still have a single total order across dimensions. For example, consider these keys in different LSM levels:

L0:
abc_0123
abc_0150
def_0114
ghi_0134

L1:
abc_0045
bcd_0091
def_0077
xyz_0080

If we get a range query for [def_0100, def_0200), a prefix filter (up to the underscore) will tell us that both levels are potentially relevant. However, if each SST file stores a simple range of the values for the second segment of the key, we would see that L1 only has [0045, 0091] which (under certain required assumptions) we are sure does not overlap with the given range query. Thus, we can filter out processing or reading any index or data blocks from L1 for the query.

This kind of case shows up with time-ordered data but is more general than filtering based on user timestamp. See #11332 . Here the "time" segments of the keys are meaningfully ordered with respect to each other even when the previous segment is different, so summarizing data along an alternate dimension of the key like this can work well for filtering.

This prototype implementation simply leverages existing APIs for user table properties and table filtering, which is not very CPU efficient. Eventually, we expect to create a native implementation. However, I have put some significant
thought and engineering into the new APIs overall, which I expect to be close to refined enough for production.

For details, see new public APIs in experimental.h. For a detailed example, see the new unit test in db_bloom_filter_test.

Test Plan: Unit test included

Summary: This adds an EXPERIMENTAL new API for splitting keys into segments and filtering on those segments, especially a filter that records the value ranges of segments and uses that to filter range queries. I sometimes call this "higher-dimensional filtering" because it goes beyond treating the space of keys as a single-dimensional range to be approximated. Instead, we can treat key segments (beyond the first) as data points in another dimension and filter out irrelevant SST on that dimension alone. A common pattern is for a key segment to correspond to some monotonic identifier or timestamp. A range query might look for a range of those identifiers within a particular prefix leading up to that identifier. If the range of identifiers in an SST file doesn't overlap that range (regardless of the leading prefix) we can filter it from being read by a bounded iterator. For example, if querying for a range of newer entries, older SST files will be filtered out. The current implementation is a prototype in the experimental.h header and rocksdb::experimental namespace. It currently uses table properties and table_filter to store and apply these new filters. That is certainly temporary, but I have put some significant thought and engineering into the rest of the new APIs, which I expect to be close to refined enough for production. For details, see new public APIs in experimental.h. For a detailed example, see the new unit test in db_bloom_filter_test. Test Plan: Unit test included

facebook-github-bot · 2024-02-09T18:58:50Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-02-09T20:48:17Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-02-09T20:49:06Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jowlyzhang

@pdillinger Thank you for this work, it looks awesome! I only have some minor comments.

jowlyzhang · 2024-02-13T18:43:08Z

include/rocksdb/experimental.h

+    // (That performance optimization is the only reason this function is here
+    // rather than in SstQueryFilterConfigsManager.)
+    virtual std::function<bool(const TableProperties&)>
+    GetTableFilterForRangeQuery(Slice lower_bound_incl,


nit: how about using const reference for the bounds input arguments?

The nature of this function is that we need to save a copy of the Slices. Plain Slice parameter allows, but doesn't require, move semantics.

It probably doesn't matter here for efficiency (nothing owned by the object to deep copy), but I like the certainty of knowing that the called function isn't accidentally saving the const X& when the function clearly needs to save a copy.

jowlyzhang · 2024-02-13T19:02:01Z

db/experimental.cc

+
+      for (size_t i = 0; i < filter_count; ++i) {
+        uint64_t filter_len;
+        if (i + 1 == filter_count) {


Maybe add error handling for when p went beyond limit here?

Assuming GetVarint64Ptr meets its contract, it's fully checked by static_cast<size_t>(limit - p) < filter_len (followed by p += filter_len) on the previous iteration. If it's violated in the future, ASAN should catch it.

jowlyzhang · 2024-02-13T19:24:47Z

db/experimental.cc

+      return 0;
+    } else {
+      // For now in the code, wraps only 1 filter, but schema supports multiple
+      return 1 + VarintLength(CategorySetToUint(categories_)) + 1 +


Should this be VarintLength(1) instead of 1?

Was relying on my knowledge that VarintLength(1) == 1. I'll improve the clarity.

jowlyzhang · 2024-02-13T21:37:09Z

db/experimental.cc

+        // outer layer will have just 1 filter in its count (added here)
+        // and this filter wrapper will have filters_to_finish.size()
+        // (added above).
+        total_size += 1;


This 1 is for the space used to record number of filters, right? Should this be VarintLength(1) instead of 1?

jowlyzhang · 2024-02-13T21:51:33Z

db/experimental.cc

+  // segment or composite (range of segments). The empty value is tracked
+  // and filtered independently because it might be a special case that is
+  // not representative of the minimum in a spread of values.
+  kBytewiseMinMaxFilter = 0x10,


Just curious why kBytewiseMinMaxFilter is 0x10, not 0x3?

Adding this before:

// ... (reserve some values for more wrappers)

jowlyzhang · 2024-02-14T20:07:32Z

db/experimental.cc

+
+ private:
+  static const std::string kTablePropertyName;
+  static constexpr char kSchemaVersion = 1;


The schema encoding is important for understanding these filter building and reading logic. Maybe we can document how schema version 1 encodes filters?

jowlyzhang · 2024-02-14T20:13:34Z

db/experimental.cc

+
+    bool MayMatch_ExtrAndCatFilterWrapper(Slice wrapper) const {
+      assert(!wrapper.empty() && wrapper[0] == kExtrAndCatFilterWrapper);
+      if (wrapper.size() <= 4) {


IIUC, kExtrAndCatFilterWrapper is followed by extractor_id length and extractor_id, how is this fast failing size 4 determined?

jowlyzhang · 2024-02-14T20:26:24Z

db/experimental.cc

+      return true;
+    }
+    Slice smallest = Slice(p, smallest_size);
+    p += smallest_size;


Maybe add error handling here for p to go beyond limit after incrementing it by smalles_size.

Already checked static_cast<size_t>(limit - p) <= smallest_size

jowlyzhang · 2024-02-14T21:58:23Z

db/db_bloom_filter_test.cc

+  EXPECT_EQ(RangeQueryKeys("abc_170", "abc_190"), Keys({}));
+  EXPECT_EQ(TestGetAndResetTickerCount(options, NON_LAST_LEVEL_SEEK_DATA), 2);
+
+  // Control 3: range is not filtered because prefixes not represented


What does this statement mean that "baa_170" to "baa_190" prefixes not represented?

jowlyzhang · 2024-02-14T22:21:51Z

db/experimental.cc

+
+    // May match if both the upper bound and lower bound indicate there could
+    // be overlap
+    return upper_bound_input.compare(smallest) >= 0 &&


Since upper bound is exclusive, if it's the same as smallest, can we also filter out the table?

I've expanded the TODO to record why that change isn't safe:

// TODO: potentially fix upper bound to actually be exclusive, but it's not // as simple as changing >= to > below, because it's upper_bound_excl that's // exclusive, and the upper_bound_input part extracted from it might not be.

facebook-github-bot · 2024-02-15T22:16:41Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-02-15T22:19:15Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-02-15T23:44:48Z

@pdillinger merged this pull request in 1201813.

facebook-github-bot added the CLA Signed label Nov 17, 2023

pdillinger force-pushed the key_segments_extractor branch 4 times, most recently from b45b60b to f8e0f37 Compare November 23, 2023 00:58

pdillinger changed the title ~~[WIP] KeySegmentsExtractor and prototype higher-dimensional filtering~~ [RFC] KeySegmentsExtractor and prototype higher-dimensional filtering Nov 29, 2023

pdillinger force-pushed the key_segments_extractor branch from 018e2f6 to 948fdd6 Compare November 29, 2023 19:23

pdillinger changed the title ~~[RFC] KeySegmentsExtractor and prototype higher-dimensional filtering~~ Prototype KeySegmentsExtractor and higher-dimensional filtering Dec 1, 2023

pdillinger changed the title ~~Prototype KeySegmentsExtractor and higher-dimensional filtering~~ Experimental KeySegmentsExtractor and higher-dimensional filtering Dec 1, 2023

pdillinger force-pushed the key_segments_extractor branch from 948fdd6 to 81caa55 Compare February 9, 2024 18:54

pdillinger changed the title ~~Experimental KeySegmentsExtractor and higher-dimensional filtering~~ KeySegmentsExtractor and prototype higher-dimensional filtering Feb 9, 2024

Fix ASC, lints, more test cases

532c44d

jowlyzhang self-requested a review February 9, 2024 21:09

jowlyzhang approved these changes Feb 14, 2024

View reviewed changes

pdillinger added 3 commits February 15, 2024 09:59

Merge remote-tracking branch 'origin/main' into key_segments_extractor

a793bda

Improvements from review

9a6f15c

Merge remote-tracking branch 'origin/main' into key_segments_extractor

5794a7b

facebook-github-bot closed this in 1201813 Feb 15, 2024

facebook-github-bot added the Merged label Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeySegmentsExtractor and prototype higher-dimensional filtering #12075

KeySegmentsExtractor and prototype higher-dimensional filtering #12075

pdillinger commented Nov 17, 2023 •

edited

facebook-github-bot commented Feb 9, 2024

facebook-github-bot commented Feb 9, 2024

facebook-github-bot commented Feb 9, 2024

jowlyzhang left a comment

jowlyzhang Feb 13, 2024

pdillinger Feb 15, 2024

pdillinger Feb 15, 2024

jowlyzhang Feb 13, 2024

pdillinger Feb 15, 2024

jowlyzhang Feb 13, 2024

pdillinger Feb 15, 2024

jowlyzhang Feb 13, 2024

jowlyzhang Feb 13, 2024

pdillinger Feb 15, 2024

jowlyzhang Feb 14, 2024

jowlyzhang Feb 14, 2024

jowlyzhang Feb 14, 2024

pdillinger Feb 15, 2024

jowlyzhang Feb 14, 2024

jowlyzhang Feb 14, 2024

pdillinger Feb 15, 2024

facebook-github-bot commented Feb 15, 2024

facebook-github-bot commented Feb 15, 2024

facebook-github-bot commented Feb 15, 2024

KeySegmentsExtractor and prototype higher-dimensional filtering #12075

KeySegmentsExtractor and prototype higher-dimensional filtering #12075

Conversation

pdillinger commented Nov 17, 2023 • edited

facebook-github-bot commented Feb 9, 2024

facebook-github-bot commented Feb 9, 2024

facebook-github-bot commented Feb 9, 2024

jowlyzhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 15, 2024

facebook-github-bot commented Feb 15, 2024

facebook-github-bot commented Feb 15, 2024

pdillinger commented Nov 17, 2023 •

edited