Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2023-11-28 17:19:21
Message-ID: CAH2-WzkSZEOy_RaqSiBKnVH=8rv-6dhFWfbHaRPfOxjyMAcZXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 28, 2023 at 4:29 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> I haven't looked at the code, but I decided to do a bit of blackbox perf
> and stress testing, to get some feeling of what to expect in terms of
> performance improvements, and see if there happen to be some unexpected
> regressions. Attached is a couple simple bash scripts doing a
> brute-force test with tables of different size / data distribution,
> number of values in the SAOP expression, etc.

My own stress-testing has focussed on the two obvious extremes for
this patch, using variants of pgbench with SAOP SELECTs on
pgbench_accounts:

1. The case where there is almost no chance of finding any two index
tuples together on the same page, because the constants are completely
random. This workload makes the patch's attempts at "coalescing
together" index page reads pure overhead, with no possible benefit.
Obviously that's a cost that needs to be kept under control.

2. The case where there are 255 of tuples with distinct values that
are clustered together (both in the key space and in physical index
pages). Usually they'll span two index pages, but they might all fit
together on one index page, allowing us to descend to it directly and
read it only once.

With 32 clients, I typically see a regression of about 1.5% for the
first case relative to master, measured in throughput/TPS. The second
case typically sees throughput that's ~4.8x master (i.e. a ~380%
increase). I consider both of these extremes to be fairly unrealistic.
With fewer array constants, the speed-ups you'll see in sympathetic
cases are still very significant, but nothing like 4.8x. They're more
like the 30% numbers that you saw.

As you know, I'm not actually all that excited about cases like 2 --
it's not where users are likely to benefit from the patch. The truly
interesting cases are those cases where we can completely avoid heap
accesses in the first place (not just *repeat* accesses to the same
index pages), due to the patch's ability to consistently use index
quals rather than filter quals. It's not that hard to show cases where
there are 100x+ fewer pages accessed -- often with cases have very few
array constants. It's just that these cases aren't that interesting
from a performance validation point of view -- it's obvious that
filter quals are terrible.

Another thing that the patch does particularly well on is cases where
the array keys don't have any matches at all, but there is significant
clustering (relatively common when multiple SAOPs are used as index
quals, which becomes far more likely due to the planner changes). We
don't just skip over parts of the index that aren't relevant -- we
also skip over parts of the arrays that aren't relevant. Some of my
adversarial test cases that take ~1 millisecond for the patch to
execute will practically take forever on master (I had to have my test
suite not run those tests against master). You just need lots of array
keys.

What master does on those adversarial cases with billions of distinct
combinations of array keys (not that unlikely if there are 4 or 5
SAOPs with mere hundreds or thousands of array keys each) is so
inefficient that we might as well call it infinitely slower than the
patch. This is interesting to me from a performance
robustness/stability point of view. The slowest kind of SAOP index
scan with the patch becomes a full index scan -- just as it would be
if we were using any type of non-SOAP qual before now. The worst case
is a lot easier to reason about.

> I'm not convinced this is a problem we have to solve. It's possible it
> only affects cases that are implausible in practice (the script forces a
> particular scan type, and maybe it would not be picked in practice). But
> maybe it's fixable ...

I would expect the patch to do quite well (relative to what is
actually possible) on cases like the two extremes that I've focussed
on so far. It seems possible that it will do less well on cases that
are somewhere in the middle (that also have lots of distinct values on
each page).

We effectively do a linear search on a page that we know has at least
one more match (following a precheck that uses the high key). We hope
that the next match (for the next array value) closely follows an
initial match. But what if there are only 2 or 3 matches on each leaf
page, that are spaced relatively far apart? You're going to have to
grovel through the whole page.

It's not obvious that that's a problem to be fixed -- we're still only
descending the index once and still only locking the leaf page once,
so we'll probably still win relative to master. And it's not that easy
to imagine beating a linear search -- it's not like there is just one
"next value" to search for in these cases. But it's something that
deserves further consideration.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-11-28 17:23:40 Re: [HACKERS] Changing references of password encryption to hashing
Previous Message Nathan Bossart 2023-11-28 17:18:32 Re: archive modules loose ends