Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2023-07-27 13:59:51
Message-ID: CAEze2WhRef3bnfRbRbeYAnemxrixM-g1p3tKzOHHb6hP2Z4=5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 27 Jul 2023 at 06:14, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Jul 26, 2023 at 12:07 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > We could cache the last accessed leaf page across amrescan operations
> > to reduce the number of index traversals needed when the join key of
> > the left side is highly (but not necessarily strictly) correllated.
>
> That sounds like block nested loop join. It's possible that that could
> reuse some infrastructure from this patch, but I'm not sure.

My idea is not quite block nested loop join. It's more 'restart the
index scan at the location the previous index scan ended, if
heuristics say there's a good chance that might save us time'. I'd say
it is comparable to the fast tree descent optimization that we have
for endpoint queries, and comparable to this patch's scankey
optimization, but across AM-level rescans instead of internal rescans.

See also the attached prototype and loosely coded patch. It passes
tests, but it might not be without bugs.

The basic design of that patch is this: We keep track of how many
times we've rescanned, and the end location of the index scan. If a
new index scan hits the same page after _bt_search as the previous
scan ended, we register that. Those two values - num_rescans and
num_samepage - are used as heuristics for the following:

If 50% or more of rescans hit the same page as the end location of the
previous scan, we start saving the scan's end location's buffer into
the BTScanOpaque, so that the next _bt_first can check whether that
page might be the right leaf page, and if so, immediately go to that
buffer instead of descending the tree - saving one tree descent in the
process.

Further optimizations of this mechanism could easily be implemented by
e.g. only copying the min/max index tuples instead of the full index
page, reducing the overhead at scan end.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment Content-Type Size
v1-0001-Cache-btree-scan-end-page-across-rescans-in-the-s.patch.cfbot-ignore application/octet-stream 8.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-07-27 14:00:31 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Previous Message Ashutosh Bapat 2023-07-27 13:58:49 Memory consumption during partitionwise join planning