Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2026-03-10 22:47:52
Message-ID: y5wp4uxudeajyljuzdm4cmqvwmzlujwzkxbadimoa64cmybgjp@5dd7le2jxc5m
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-03-10 16:57:35 -0400, Peter Geoghegan wrote:
> On Fri, Feb 27, 2026 at 6:52 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > This is a huge change. Is there a chance we can break it up into more
> > manageable chunks?
>
> Attached is v12, which has revisions that address most of your
> feedback items. It also includes items that address problems that I
> noticed during performance validation work.
>
> Highlights:
>
> * Substantial revisions that give table AMs and index AMs direct
> control over batch layout -- without giving up on batch
> recycling/caching. This is essentially what you (Andres) requested
> because the design from v11 was not sufficiently AM agnostic. In
> particular:
>
> - Table AMs now control the size and layout of visibility information
> (in practice heapam uses this to store per-item visibility state from
> the visibility map).
>
> - Index AMs have their own opaque state for things like sibling link
> block numbers, avoiding the assumption that other index AMs supporting
> amgetbatch will need to work like nbtree and hash as regards how they
> navigate to the next index page/index keyspace associated with each
> batch.

Nice!

> * No more read stream yielding. Numerous new patches from Andres are
> now included, which helps with this. In particular, "WIP: read_stream:
> Only increase distance when waiting for IO" fixes the problematic
> regression in an adversarial query -- the one that prompted me to
> invent yielding in the first place. As a result of all this, the read
> stream callback added by the prefetching commit itself is now
> substantially simpler than it was in v11.

Yay.

> * There are now a couple of extra patches created by breaking things
> into more distinct commits. Namely, there's a new "heapam: Track heap
> block in IndexFetchHeapData using xs_blk" commit, as well as a new
> "Make IndexScanInstrumentation a pointer in executor scan nodes"
> commit.

Yay^2.

> * Moreover, some commits now appear in a slightly different order,
> prioritizing work closer to being committable; those commits now come
> first.

Yay^3.

> * New commit "Use simple hash for PrivateRefCount" addresses some of
> the problems we were seeing with PrivateRefCount performance. This
> generic optimization addresses an existing problem that would
> otherwise be much worse with the index prefetching work in place.

Let's get that in soon.

Alexandre Felipe posted an implementation of this in
https://postgr.es/m/CAE8JnxNTETEUiAOF31%3D_yo%3DpvyAi9npOeJfcTvEJJbi4vomtYA%40mail.gmail.com

I don't agree with many of the other changes, but the simplehash conversion
contains an interesting piece - the ability to avoid the status field. I'd
encourage Alexandre to upstream that separately from this thread (and also
separately from the rest of the patches in the above thread).

> However, I have NOT yet acted on a few feedback items from Andres:
>
> * I still don't know what Andres meant about requiring table AMs to
> free batch index page buffer pins representing a modularity violation.
> I don't see how we can reasonably avoid it while still preserving the
> guarantees needed to safely drop buffer pins eagerly during index-only
> scans that require prefetching.
>
> * I'm also not at all sure what Andres meant about index AMs like hash
> not holding onto their own buffer pins, given that prefetching uses a
> read stream sensitive to the number of buffer pins the backend holds.

I tried to respond in
https://postgr.es/m/vbb4naf2tvm2tm7yoml54pzvrmn77p4nvq4awfa4wufc3hn7qx%40mof5q6li3xzv
to explain my concerns / what I think needs to happen.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-10 23:04:37 Re: Streamify more code paths
Previous Message Zsolt Parragi 2026-03-10 22:40:48 Re: Make PGOAUTHCAFILE in libpq-oauth work out of debug mode