Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2024-03-07 21:07:28
Message-ID: CAH2-WzngJFKNQ2kN2Ae-Mu_DkKdxebLJhmAPcqMO_Mj5zxU_rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 6, 2024 at 4:51 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> To clarify, what I mean here is that merging the changes of both the
> SAOPs changes and the removal of arrayKeyData seems to increase the
> patch size and increases the maximum complexity of each component
> patch's review.

Removing arrayKeyData probably makes the patch very slightly smaller,
actually. But even if it's really the other way around, I'd still like
to get rid of it as part of the same commit as everything else. It
just makes sense that way.

> Separate patches may make this more reviewable, or not, but no comment
> was given on why it is better to merge the changes into a single
> patch.

Fair enough. Here's why:

The original SAOP design (commit 9e8da0f7, "Teach btree to handle
ScalarArrayOpExpr quals natively") added a layer of indirection
between scan->keyData (input scan keys) and so->keyData (output scan
keys): it added another scan key array, so->arrayKeyData. There was
array-specific preprocessing in _bt_preprocess_array_keys, that
happened before the first primitive index scan even began -- that
transformed our true input scan keys (scan->keyData) into a copy of
the array with limited amounts of array-specific preprocessing already
performed (so->arrayKeyData).

This made a certain amount of sense at the time, because
_bt_preprocess_keys was intended to be called once per primitive index
scan. Kind of like the inner side of a nested loop join's inner index
scan, where we call _bt_preprocess_keys once per inner-side
scan/btrescan call. (Actually, Tom's design has us call _bt_preprocess
once per primitive index scan per btrescan call, which might matter in
those rare cases where the inner side of a nestloop join had SAOP
quals.)

What I now propose to do is to just call _bt_preprocess_keys once per
btrescan (actually, it's still called once per primitive index scan,
but all calls after the first are just no-ops after v12 of the patch).
This makes sense because SAOP array constants aren't like nestloop
joins with an inner index scans, in one important way: we really can
see everything up-front. We can see all of the array elements, and
operate on whole arrays as necessary during preprocessing (e.g.,
performing the array merging thing I added to
_bt_preprocess_array_keys).

It's not like the next array element is only visible to prepocessing
only after the outer side of a nestloop join runs, and next calls
btrescan -- so why treat it like that? Conceptually, "WHERE a = 1" is
almost the same thing as "WHERE a = any('{1,2}')", so why not use
essentially the same approach to preprocessing in both cases? We don't
need to copy the input keys into so->arrayKeyData, because the
indirection (which is a bit like a fake nested loop join) doesn't buy
us anything.

v13 of the patch doesn't quite 100% eliminate so->arrayKeyData. While
it removes the arrayKeyData field from the BTScanOpaqueData struct, we
still use a temporary array (accessed via a pointer that's just a
local variable) that's also called arrayKeyData. And that also stores
array-preprocessing-only copies of the input scan keys. That happens
within _bt_preprocess_keys.

So we do "still need arrayKeyData", but we only need it for a brief
period at the start of the index scan. It just doesn't make any sense
to keep it around for longer than that, in a world where
_bt_preprocess_keys "operates directly on arrays". That only made
sense (a bit of sense) back when _bt_preprocess_keys was subordinate
to _bt_preprocess_array_keys, but it's kinda the other way around now.
We could probably even get rid of this remaining limited form of
arrayKeyData, but that doesn't seem like it would add much.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-07 21:36:00 Re: Popcount optimization using AVX512
Previous Message Peter Geoghegan 2024-03-07 20:34:52 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan