RE: MDAM techniques and Index Skip Scan patch

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jesper Pedersen" <jesper(dot)pedersen(at)redhat(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: MDAM techniques and Index Skip Scan patch
Date: 2021-10-23 19:30:47
Message-ID: 2f26e1e475364ab7b23f466e4a4f4392@opammb0562.comp.optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Great to see some interest in the skip scan patch series again!

> Like many difficult patches, the skip scan patch is not so much troubled by
> problems with the implementation as it is troubled by
> *ambiguity* about the design. Particularly concerning how skip scan meshes
> with existing designs, as well as future designs -- particularly designs for
> other MDAM techniques. I've started this thread to have a big picture
> conversation about how to think about these things. Many other MDAM
> techniques also seem highly appealing.

I think it is good to have this discussion. In my opinion, Postgres could make really good use of some of the described MDAM techniques.

> Much of the MDAM stuff is for data warehousing use-cases, while skip
> scan/loose index scan is seen as more of an OLTP thing. But they are still
> related, clearly.

FWIW I think skip scan is very much data warehousing use-case related - hence why the TimescaleDb people in your [2] reference implemented a simple form of it already for their extension. Skip scan is a really useful feature for large data sets. However, I agree it is only one part of the bigger MDAM picture.

>
> My general concern is that the skip scan patch may currently be structured in
> a way that paints us into a corner, MDAM-wise.
>

One of the concerns I raised before was that the patch may be thinking too simplistically on some things, that would make it difficult to adopt more complex optimizations in the future. One concrete example can be illustrated by a different query on the sales table of the paper's example:

SELECT DISTINCT dept, date WHERE item_class = 100

This should skip with prefix of (dept, date). Suppose we're at (dept, date) = (1, 2021-01-01) and it's skipping to the next prefix, the patch just implements what the MDAM paper describes as the 'probing' step. It finds the beginning of the next prefix. This could be for example (dept, date, item_class) = (1, 2021-01-02, 1). From there onwards, it would just scan the index until it finds item_class=100. What it should do however, is to first 'probe' for the next prefix value and then skip directly to (1, 2021-01-02, 100) (skipping item_class 1-99 altogether). The problem if it doesn't support this is that skip scan could have a quite unpredictable performance, because sometimes it'll end up going through most of the index where it should be skipping.

A while ago, I spent quite some time trying to come up with an implementation that would work in this more general case. The nice thing is that with such a more generic implementation, you get almost all the features from the MDAM paper almost for free. I focused on the executor code, not on the planner code - the planner code is for the DISTINCT skip part very similar to the original patch and I hacked in a way to make it choose a 'skip scan' also for non-DISTINCT queries for testing purposes. For this discussion about MDAM, the planner part is less relevant though. There's still a lot of discussion and work on the planner-side too, but I think that is completely independent from each other.

The more generic patch I originally posted in [1], together with some technical considerations. That was quite a while ago so it obviously doesn't apply anymore on master. Therefore, I've attached a rebased version. Unfortunately, it's still based on an older version of the UniqueKeys patch - but since that patch is all planner machinery as well, it doesn't matter so much for the discussion about the executor code either.

I believe if we want something that fits better with future MDAM use cases, we should take a closer look at the executor code of this patch to drive this discussion. The logic is definitely more complex than the original patch, however I believe it is also more flexible and future-proof.

>
> Another more concrete concern about the patch series comes from the
> backwards scan stuff. This is added by a later patch in the patch series, "v39-
> 0004-Extend-amskip-implementation-for-Btree.patch". It strikes me as a bad
> thing that we cannot just do leaf-page-at-a-time processing, without usually
> needing to hold a pin on the leaf page.
> After all, ordinary backwards scans manage to avoid that today, albeit by
> using trickery inside _bt_walk_left(). MDAM-style "Maintenance of Index
> Order" (as described on page 8) seems like a good goal for us here. I don't
> like the idea of doing ad-hoc duplicate TID elimination inside nbtree, across
> calls made from the executor (whether it's during backwards skip scans, or at
> any other time). Not because it seems to go against the approach taken by
> the MDAM paper (though it does); just because it seems kludgy. (I think that
> Tom felt the same way about the TID deduplication stuff in his own patch
> back in 2018,
> too.)

It's good to mention that the patch I attached does proper 'leaf-page-at-a-time' processing, so it avoids the problem you describe with v39. It is implemented instead in the same way as a "regular" index scan - we process the full leaf page and store the matched tuples in the local state. If a DISTINCT scan wants to do a skip, we check our local state first if that skipping would be possible with the matched tuples from the current page. Then we avoid double work and also the need to look at the same page again.

> Open question: What does all of this MDAM business mean for
> ScalarArrayOpExpr, if anything?
>

This is a really interesting combination actually. I think, ideally, you'd probably get rid of it and provide full support for that with the 'skip' based approach (essentially the ScalarArrayOpExpr seems to do some form of skipping already - it transforms x IN (1,2,3) into 3 separate index scans for x).
However, even without doing any work on it, it actually interacts nicely with the skip based approach.

As an example, here's some queries based on the 'sales' table of the paper with some data in it (18M rows total, see sales_query.sql attachment for the full example):

-- terminology from paper: "intervening range predicate"
select date, sum(total_sales)
from sales
where dept between 2 and 3 and date between '2021-01-05' and '2021-01-10' and item_class=20 and store=250
group by dept, date
;
Patch: Execution Time: 0.368 ms
Master: Execution Time: 416.792 ms

-- terminology from paper: "missing key predicate"
select date, sum(total_sales)
from sales
where date between '2021-01-05' and '2021-01-10' and item_class=20 and store=250
group by dept, date
;
Patch: Execution Time: 0.667 ms
Master: Execution Time: 654.684 ms

-- terminology from paper: "IN lists"
-- this is similar to the query from your example with an IN list
-- in the current patch, this query is done with a skip scan with skip prefix (dept, date) and then the ScalarOpArray for item_class=(20,30,50)
select date, sum(total_sales)
from sales
where date between '2021-01-05' and '2021-01-10' and item_class in (20, 30, 50) and store=250
group by dept, date
;
Patch: Execution Time: 1.767 ms
Master: Execution Time: 629.792 ms

The other mentioned MDAM optimizations in the paper (NOT =, general OR) are not implemented but don't seem to be conflicting at all with the current implementation - they seem to be just a matter of transforming the expressions into the right form.

From the patch series above, v9-0001/v9-0002 is the UniqueKeys patch series, which focuses on the planner. v2-0001 is Dmitry's patch that extends it to make it possible to use UniqueKeys for the skip scan. Both of these are unfortunately still older patches, but because they are planner machinery they are less relevant to the discussion about the executor here.
Patch v2-0002 contains most of my work and introduces all the executor logic for the skip scan and hooks up the planner for DISTINCT queries to use the skip scan.
Patch v2-0003 is a planner hack that makes the planner pick a skip scan on virtually every possibility. This also enables the skip scan on the queries above that don't have a DISTINCT but where it could be useful.

-Floris

[1] https://www.postgresql.org/message-id/c5c5c974714a47f1b226c857699e8680@opammb0561.comp.optiver.com

Attachment Content-Type Size
sales_query.sql application/octet-stream 1.0 KB
v9-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch application/octet-stream 4.8 KB
v9-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-stru.patch application/octet-stream 59.4 KB
v2-0001-Extend-UniqueKeys.patch application/octet-stream 13.0 KB
v2-0002-Index-skip-scan.patch application/octet-stream 230.9 KB
v2-0003-Support-skip-scan-for-non-distinct-scans.patch application/octet-stream 18.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2021-10-23 19:57:02 Allow pg_signal_backend members to use pg_log_backend_memory_stats().
Previous Message Mikhail 2021-10-23 19:02:16 Re: [PATCH] Make ENOSPC not fatal in semaphore creation