Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2024-03-06 00:50:18
Message-ID: CAH2-Wz=75d0Yx-s15QNPN2sogfZK4PF7xOQFjPS8=Ca7C0f9+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 4, 2024 at 3:51 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > + that searches for multiple values together. Queries that use certain
> > + <acronym>SQL</acronym> constructs to search for rows matching any value
> > + out of a list (or an array) of multiple scalar values might perform
> > + multiple <quote>primitive</quote> index scans (up to one primitive scan
> > + per scalar value) at runtime. See <xref linkend="functions-comparisons"/>
> > + for details.
>
> I don't think the "see <functions-comparisons> for details" is
> correctly worded: The surrounding text implies that it would contain
> details about in which precise situations multiple primitive index
> scans would be consumed, while it only contains documentation about
> IN/NOT IN/ANY/ALL/SOME.
>
> Something like the following would fit better IMO:
>
> + that searches for multiple values together. Queries that use certain
> + <acronym>SQL</acronym> constructs to search for rows matching any value
> + out of a list or array of multiple scalar values (such as those
> described in
> + <functions-comparisons> might perform multiple <quote>primitive</quote>
> + index scans (up to one primitive scan per scalar value) at runtime.

I think that there is supposed to be a closing parenthesis here? So
"... (such as those described in <functions-comparisons>") might
perform...".

FWM, if that's what you meant.

> Then there is a second issue in the paragraph: Inverted indexes such
> as GIN might well decide to start counting more than one "primitive
> scan" per scalar value, because they may need to go through their
> internal structure more than once to produce results for a single
> scalar value; e.g. with queries WHERE textfield LIKE '%some%word%', a
> trigram index would likely use at least 4 descents here: one for each
> of "som", "ome", "wor", "ord".

Calling anything other than an executor invocation of
amrescan/index_rescan an "index scan" (or "primitive index scan") is a
bit silly IMV, since, as you point out, whether the index AM manages
to do things like descend the underlying physical index structure is
really just an implementation detail. GIN has more than one B-Tree, so
it's not like there's necessarily just one "index structure", anyway.
It seems to me the only meaningful definition of index scan is
something directly tied to how the index AM gets invoked. That's a
logical, index-AM-agnostic concept. It has a fixed relationship to how
EXPLAIN ANALYZE displays information.

But we're not living in a perfect world. The fact is that the
pg_stat_*_tables system views follow a definition that brings these
implementation details into it. Its definition of index scan is
probably a legacy of when SAOPs could only be executed in a way that
was totally opaque to the index AM (which is how it still works for
every non-nbtree index AM). Back then, "index scan" really was
synonymous with an executor invocation of amrescan/index_rescan with
SAOPs.

I've described the issues in this area (in the docs) in a way that is
most consistent with historical conventions. That seems to have the
fewest problems, despite everything I've said about it.

> > > All that really remains now is to research how we might integrate this
> > > work with the recently added continuescanPrechecked/haveFirstMatch
> > > stuff from Alexander Korotkov, if at all.
> >
> > The main change in v12 is that I've integrated both the
> > continuescanPrechecked and the haveFirstMatch optimizations. Both of
> > these fields are now page-level state, shared between the _bt_readpage
> > caller and the _bt_checkkeys/_bt_advance_array_keys callees (so they
> > appear next to the new home for _bt_checkkeys' continuescan field, in
> > the new page state struct).
>
> Cool. I'm planning to review the rest of this patch this
> week/tomorrow, could you take some time to review some of my btree
> patches, too?

Okay, I'll take a look again.

Attached is v13.

At one point Heikki suggested that I just get rid of
BTScanOpaqueData.arrayKeyData (which has been there for as long as
nbtree had native support for SAOPs), and use
BTScanOpaqueData.keyData exclusively instead. I've finally got around
to doing that now.

These simplifications were enabled by my new approach within
_bt_preprocess_keys, described when I posted v12. v13 goes even
further than v12 did, by demoting _bt_preprocess_array_keys to a
helper function for _bt_preprocess_keys. That means that we do all of
our scan key preprocessing at the same time, at the start of _bt_first
(though only during the first _bt_first, or to be more precise during
the first per btrescan). If we need fancier
preprocessing/normalization for arrays, then it ought to be a lot
easier with this structure.

Note that we no longer need to have an independent representation of
so->qual_okay for array keys (the convention of setting
so->numArrayKeys to -1 for unsatisfiable array keys is no longer
required). There is no longer any need for a separate pass to carry
over the contents of BTScanOpaqueData.arrayKeyData to
BTScanOpaqueData.keyData, which was confusing.

Are you still interested in working directly on the preprocessing
stuff? I have a feeling that I was slightly too optimistic about how
likely we were to be able to get away with not having certain kinds of
array preprocessing, back when I posted v12. It's true that the
propensity of the patch to not recognize "partial
redundancies/contradictions" hardly matters with redundant equalities,
but inequalities are another matter. I'm slightly worried about cases
like this one:

select * from multi_test where a in (1,99, 182, 183, 184) and a < 183;

Maybe we need to do better with that. What do you think?

--
Peter Geoghegan

Attachment Content-Type Size
v13-0001-Enhance-nbtree-ScalarArrayOp-execution.patch application/x-patch 160.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-03-06 01:12:18 Re: Make query cancellation keys longer
Previous Message Japin Li 2024-03-06 00:24:09 Re: Improve readability by using designated initializers when possible