General principles for index quals ================================== Tomas Vondra's "evaluate filters on the index tuple" patch works by making an expressions that didn't becoming index quals into additional index filter (not table filters) where possible. The general idea is that the planner should recognize a kind of "qual hierarchy", where it is always preferable to make each index path have use true index quals where possible, but, failing that, it is still preferable to make the predicate into an index filter rather than into a table filter. In planner: Index Quals > Index Filter > Table Filter Although this general design makes sense, it nevertheless can lead to suboptimal outcomes: situations where the planner chooses a plan that is obviously not as good as some hypothetical other plan that manages to use true index quals through some (currently unimplemented) other means. This isn't really a problem with the patch itself; it is more a case of the planner lacking complementary optimizations that make predicates into true index quals before the mechanism added by the patch is even reached. The "catch-all" nature of the patch isn't a problem to be fixed -- its very general nature is a key strength of the approach (it uses full expression evaluation, which is much more general than any approach that ultimately reduces predicates to indexable operators from an opclass). This document lists certain specific scenarios where the patch doesn't quite do as well as one would hope. That puts things on a good footing for adding other complementary techniques later on. Theoretically this will be independent work, that could have happened at any point in the past. But the patch has brought these other related issues into sharp relief for the first time -- so it doesn't necessarily feel all that independent. A working "taxonomy" for all of these techniques seems useful, even one such as this, that's generally acknowledged to be incomplete and far from perfect. Why index filters are better than equivalent table filters ---------------------------------------------------------- This part is very simple. + They're better because they have a decent chance of avoiding significant amounts of heap accesses for expression evaluation. This extends some of the benefits that index-only scans have to plain index scans. Why index quals are better than equivalent index filters -------------------------------------------------------- The advantages that index quals have over index filters can be much less clear than one would hope because index filters are confusingly similar to index quals. But index quals are significantly better, overall. This is guaranteed to be true, assuming we're comparing "equivalent" index quals and index filters -- logically interchangeable forms of quals with different physical/runtime characteristics. (An index filter might be better than an index qual when it happens to be much more selective, because it filters tuples in some fundamentally different way, but that's not relevant to this discussion. We're focussed on clear-cut cases, where we can speak with the most confidence, since that's a useful basis for further improvements later on.) We may even be missing out on the advantages of true index quals in cases where the patch actually offers big improvements compared to what was possible in Postgres 16. Since, of course, index filters can still be much better than table filters. To make matters more confusing, the consequences may only be apparent at certain times, for a given query. And minor variants of the same query can be very sensitive to whether true index quals or mere index filters could be used. We must keep all of this in perspective, which is hard. Example 1: bitmap index scan, prefer index quals ------------------------------------------------ + Index quals are better than equivalent index filters because bitmap index scans can only use index quals If we have an index on "tenk1 (four, two)", then we miss out on the opportunity to eliminate heap accesses for a query like this one: select ctid, * from tenk1 where four = 1 and two != 1; This gives us a "bad" plan (at least in relative-to-ideal-case terms), since there is an excessive amount of heap access: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Bitmap Heap Scan on public.tenk1 (cost=25.35..407.85 rows=1250 width=250) (actual time=0.707..0.707 rows=0 loops=1) │ │ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Recheck Cond: (tenk1.four = 1) │ │ Filter: (tenk1.two <> 1) │ │ Rows Removed by Filter: 2500 │ │ Heap Blocks: exact=345 │ │ Buffers: shared hit=349 │ │ -> Bitmap Index Scan on tenk1_four_two_idx (cost=0.00..25.04 rows=2500 width=0) (actual time=0.073..0.074 rows=2500 loops=1) │ │ Index Cond: (tenk1.four = 1) │ │ Buffers: shared hit=4 │ │ Planning Time: 0.037 ms │ │ Execution Time: 0.719 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Here we see a bitmap index scan plan (that uses our composite index), which makes sense overall. But the details beyond that make no sense -- since we're using table filter quals for "two". It turns out that the bitmap heap scan will access every single heap page in the tenk1 table as a result, even though we could have literally avoided all heap accesses had we been able to push down the != as an index qual. As things stand, the patch gives an enormous advantage to index scans over bitmap index scans. But the costing fails to capture that difference. That may be okay in many individual cases, but it seems questionable here. Intuitively, there is no particular reason why this particular case (involving a simple inequality) couldn't find a way to use index quals instead of relying on index filters. That would be in keeping with the traditional tendency of index scans to do approximately the same thing as similar bitmap index scans (just in a slightly different order). This example relies on an understanding that we really could make != into just another indexable operator, since it is already a kind of an honorary member of the btree opclass. That is pretty clear cut -- nbtree knows how to do a bunch of fairly similar tricks already. Just for context, here is the patch's "good" index scan plan (or better plan, at least). This plan does the trick that we'd ideally see from both index scans and bitmap index scans alike (namely, it avoids all heap accesses): ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_four_two_idx on public.tenk1 (cost=0.29..709.59 rows=1250 width=250) (actual time=0.300..0.300 rows=0 loops=1) │ │ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: (tenk1.four = 1) │ │ Index Filter: (tenk1.two <> 1) │ │ Rows Removed by Index Recheck: 2500 │ │ Filter: (tenk1.two <> 1) │ │ Buffers: shared hit=5 │ │ Planning Time: 0.035 ms │ │ Execution Time: 0.311 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (Though of course it should actually make everything into "Index Cond:" index quals -- need to teach nbtree how to do that in some related patch that I have yet to start work on.) Costing each variant becomes much less important if we can find a way to make all useful runtime behavior available with both plan variants. Example 2: using index filters is inevitable and natural -------------------------------------------------------- Similarly, we can imagine a somewhat similar query where the only possible approach will be index filters put there by the patch. Here we use the same index as before, on "tenk1 (four, two)": select ctid, * from tenk1 where four = 1 and 2/two = 1; This gives us a good plan: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_four_two_idx on public.tenk1 (cost=0.29..59.14 rows=14 width=250) (actual time=0.321..0.321 rows=0 loops=1) │ │ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: (tenk1.four = 1) │ │ Index Filter: ((2 / tenk1.two) = 1) │ │ Rows Removed by Index Recheck: 2500 │ │ Filter: ((2 / tenk1.two) = 1) │ │ Buffers: shared hit=5 │ │ Planning Time: 0.043 ms │ │ Execution Time: 0.334 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Here we must do a visibility check in order to know ahead of time that any division-by-zero errors are limited to those index tuples actually visible to our MVCC snapshot. That factor is what makes it very clear that the only sensible way to do this is by using an index filter -- index AMs cannot be expected to do true expression evaluation. One would hope that the optimizer would be able to give full credit to the index scan plan over a similar, potentially much slower bitmap index scan plan (where the use of "table filters" is inevitable and has the potential to make things far slower). Whether or not that happens is ultimately a fairly standard question of costing, which makes it out of scope for this document. Our first and second examples can be placed into each of two different buckets in a fairly clear cut, black-and-white way. There is also a huge amount of gray area that one can think of, but that's out of scope for this document. This document aims to be a starting point that will provide starting guidelines about how to categorize any of these gray area cases. Of course, our ability to implement various transformations as a practical matter (e.g., teaching nbtree to deal with inequalities natively) is bound to play a very important role in practice -- maybe even the largest role. Example 3: "risky" plan shifts away from a BitmapOr plan -------------------------------------------------------- + Index quals are better than index filters because they have the potential to give the index AM important context. + Index quals are better than index filters because they can reliably avoid heap access, no matter how the VM is set. Currently, the patch has one notable change in regression tests output, for this query: SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); Master branch plan, using regression test composite index on "tenk1(thousand, tenthous)": ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Bitmap Heap Scan on public.tenk1 (cost=6.89..8.91 rows=1 width=244) (actual time=0.010..0.011 rows=1 loops=1) │ │ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Recheck Cond: (((tenk1.thousand = 42) AND (tenk1.tenthous = 1)) OR ((tenk1.thousand = 42) AND (tenk1.tenthous = 3)) OR ((tenk1.thousand = 42) AND (tenk1.tenthous = 42))) │ │ Heap Blocks: exact=1 │ │ Buffers: shared hit=7 │ │ -> BitmapOr (cost=6.89..6.89 rows=1 width=0) (actual time=0.008..0.009 rows=0 loops=1) │ │ Buffers: shared hit=6 │ │ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1) │ │ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 1)) │ │ Buffers: shared hit=2 │ │ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1) │ │ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 3)) │ │ Buffers: shared hit=2 │ │ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1) │ │ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 42)) │ │ Buffers: shared hit=2 │ │ Planning Time: 0.059 ms │ │ Execution Time: 0.028 ms │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ The patch uses this plan, which is the faster plan according to every traditional measure: ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..4.38 rows=1 width=244) (actual time=0.010..0.012 rows=1 loops=1) │ │ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: (tenk1.thousand = 42) │ │ Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) │ │ Rows Removed by Index Recheck: 9 │ │ Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) │ │ Buffers: shared hit=4 │ │ Planning Time: 0.051 ms │ │ Execution Time: 0.023 ms │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Notably, the original BitmapOr plan only accesses one heap page buffer. So the patch gets a faster plan by saving on index page accesses, even though that wasn't really its intent. Example 4: alternative approach, by SAOP patch ---------------------------------------------- Another patch (this one from Peter Geoghegan) improves SAOP execution. If you run the same query with that patch applied, rewritten to use an IN ( ... ), you get the following similar though slightly better plan: ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..8.33 rows=1 width=244) (actual time=0.008..0.008 rows=1 loops=1) │ │ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = ANY ('{1,3,42}'::integer[]))) │ │ Buffers: shared hit=3 │ │ Planning Time: 0.043 ms │ │ Execution Time: 0.017 ms │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ This plan does the absolute bare minimum number of buffer accesses, since it doesn't repeat any single individual access to the same buffer. This is a key design goal for the SAOP patch. That in itself isn't terribly interesting just yet -- what's interesting is that the plan is more _robust_, in the sense that will _reliably_ perform a lot better than the similar index filter plan, even if various closely related things should happen to shift. Suppose, for example, we do this: insert into tenk1 (thousand, tenthous) select 42, i from generate_series(43, 1000) i; The situation at execution time with the SAOP patch remains completely unchanged. If we re-execute the query we once again get 3 buffer hits (the plan will _reliably_ do the absolute minimum number of page accesses in the way we already described, even). Meanwhile, this is what we see for the index filter patch -- much higher execution cost compared to our original example 3. This is due to a change that doesn't affect any of the rows the query needs to return anyway: ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..4.38 rows=1 width=244) (actual time=0.020..0.382 rows=1 loops=1) │ │ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: (tenk1.thousand = 42) │ │ Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) │ │ Rows Removed by Index Recheck: 967 │ │ Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) │ │ Buffers: shared hit=336 │ │ Planning Time: 0.088 ms │ │ Execution Time: 0.402 ms │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ To be fair, the index filter patch wouldn't have done too badly had we consistently used SAOP syntax (it would have been just like Postgres 16, which isn't "robust", but also isn't as "risky"). It is the same as the plan that the SAOP gets, except that we see the same issue with excessive primitive index scans/repeated index page accesses: ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐│ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..8.90 rows=1 width=244) (actual time=0.011..0.012 rows=1 loops=1) │ │ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │ │ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = ANY ('{1,3,42}'::integer[]))) │ │ Buffers: shared hit=7 │ │ Planning Time: 0.047 ms │ │ Execution Time: 0.022 ms │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ The extra primitive index scans are of minimal concern here (though that might not be true in more elaborate cases with hundreds of constants). The real problem here seems to be a perverse interaction with OR expression evaluation. It's already possible to effectively transform the expression into index quals for the BitmapOr path, through other means. Ideally, we'd consistently "normalize to CNF" earlier, so that we'd at least get a "robust" SAOP-based plan in most simple cases involving OR expressions (no less robust than in Postgres 16). That would mean that the index filter patch would get a plan that was expected to be more expensive, but also one that is relatively robust. But once we had the main SAOP patch then everything would be strictly better here. In general, a SAOP (with nbtree, which has native support for SAOP clauses) passes down true index quals, that give the index AM the context required to terminate scans early, and to avoid heap accesses. While the SAOP patch pushes that idea a lot further, it isn't really a new idea. It should also be noted that the problem with the index filter patch is one of degree. Even Postgres 16 will eventually have the same problem, with a sufficiently elaborate OR-based expression -- since the cost of repeated index accesses pushes things in that direction eventually. The SAOP patch has the advantage of reliably avoiding the issue by reliably avoiding repeat accesses to index pages. This means that the plan we saw is always the cheapest plan, no matter how the details are varied. This happens because we simply don't have to ever make a trade-off between heap page accesses and index page accesses -- the SAOP plan is literally guaranteed to win, no matter what happens at runtime. For that reason, "risky OR plans" can be address within the scope of the main SAOP patch, and/or Alena Rybakina's OR-to-SAOP transformation patch. It seems unreasonable to make that the problem of the index filter patch. (Actually, it's still possible that we ought to have chosen a sequential scan, and make the wrong choice, but that's not something that we can just avoid.) + It's suboptimal (and weird) that we might sometimes rely on index filters to reduce index page accesses + Index quals can achieve the same desirable outcome "robustly", because the index AM has all required context + "Choice is confusion" -- we should always prefer one index path that "offers all of the advantages" over two or more hypothetical alternative index paths that can only do one thing well + More generally, we should try to avoid nonlinear cost differences between highly similar plans (e.g., bitmap index scan vs index scan of same index). In nbtree (using Markus Winand's terminology, which we should arguably have exposed in EXPLAIN output): Access predicates > Index filter predicates Within nbtree itself, we could say that SK_BT_REQFWD and SK_BT_REQBKWD scan keys are both types of "Access predicates". It's a bit more complicated than that because with inequalities you'll often have scan keys that are required in one scan direction (e.g., forward scan case), but not required in the other direction (should the direction change). BTEqualStrategyNumber scan keys are both SK_BT_REQFWD and SK_BT_REQBKWD at the same time, regardless of other details. But overall SK_BT_REQFWD/SK_BT_REQBKWD scan keys indicate Access Predicates, while anything else is merely index filter predicates, which don't affect our scan's initial position nor affect when the scan can terminate. Presumably we'd need to make != scan keys (discussed in example 1) into index filter predicates in this sense, as part of any patch to teach index AMs (likely just nbtree) about simple inequalities.