Quick Links

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>
Subject:	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date:	2023-07-27 13:59:51
Message-ID:	CAEze2WhRef3bnfRbRbeYAnemxrixM-g1p3tKzOHHb6hP2Z4=5A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 27 Jul 2023 at 06:14, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Jul 26, 2023 at 12:07 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > We could cache the last accessed leaf page across amrescan operations
> > to reduce the number of index traversals needed when the join key of
> > the left side is highly (but not necessarily strictly) correllated.
>
> That sounds like block nested loop join. It's possible that that could
> reuse some infrastructure from this patch, but I'm not sure.

My idea is not quite block nested loop join. It's more 'restart the
index scan at the location the previous index scan ended, if
heuristics say there's a good chance that might save us time'. I'd say
it is comparable to the fast tree descent optimization that we have
for endpoint queries, and comparable to this patch's scankey
optimization, but across AM-level rescans instead of internal rescans.

See also the attached prototype and loosely coded patch. It passes
tests, but it might not be without bugs.

The basic design of that patch is this: We keep track of how many
times we've rescanned, and the end location of the index scan. If a
new index scan hits the same page after _bt_search as the previous
scan ended, we register that. Those two values - num_rescans and
num_samepage - are used as heuristics for the following:

If 50% or more of rescans hit the same page as the end location of the
previous scan, we start saving the scan's end location's buffer into
the BTScanOpaque, so that the next _bt_first can check whether that
page might be the right leaf page, and if so, immediately go to that
buffer instead of descending the tree - saving one tree descent in the
process.

Further optimizations of this mechanism could easily be implemented by
e.g. only copying the min/max index tuples instead of the full index
page, reducing the overhead at scan end.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment	Content-Type	Size
v1-0001-Cache-btree-scan-end-page-across-rescans-in-the-s.patch.cfbot-ignore	application/octet-stream	8.6 KB

In response to

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan at 2023-07-27 04:13:47 from Peter Geoghegan

Responses

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan at 2023-07-31 16:34:47 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2023-07-27 14:00:31	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Previous Message	Ashutosh Bapat	2023-07-27 13:58:49	Memory consumption during partitionwise join planning