Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-09-10 22:24:16
Message-ID: CAH2-WzkAC_gWQgc3MV3MWBx2EUaE4TfPk8XQPkJgs1R_qNvo=A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2025 at 2:55 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Aren't plenty of real-world data sets correlated, but not perfectly?

Attached is the latest revision of the prefetching patch, taken from
the shared branch that Tomas and I have been working on for some
weeks.

This revision is the first "official revision" that uses the complex
approach, which we agreed was the best approach right before we
started collaborating through this shared branch. While Tomas and I
have posted versions of this "complex" approach at various times,
those were "unofficial" previews of different approaches. Whereas this
is the latest official patch revision of record, that should be tested
by CFTester for the prefetch patch's CF entry, etc.

We haven't done a good job of maintaining an unambiguous, easy to test
"official" CF entry patch before now. That's why I'm being explicit
about what this patch revision represents. It's the shared work of
Tomas and I; it isn't some short-term experimental fork. Future
revisions will be incremental improvements on what I'm posting now.

Our focus has been on fixing a variety of regressions that came to
light following testing by Tomas. There are a few bigger changes that
are intended to fix these regressions, plus lots of small changes.

There's too many small changes to list. But the bigger changes are:

* We're now carrying Andres' patch [1] that deals with inefficiencies
on the read stream side [2]. We need this to get decent performance
with certain kinds of index scans where the same heap page buffer
needs to be read multiple times in close succession.

* We now delay prefetching/creating a new read stream until after
we've already read one index batch, with the goal of avoiding
regressions on cheap, selective queries (e.g., pgbench SELECT). This
optimization has been referred to as the "priorbatch" optimization
earlier in this thread.

* The third patch is a new one, authored by Tomas. It aims to
ameliorate nestloop join regressions by caching memory used to store
batches across rescans.

This is still experimental.

* The regression that we were concerned about most recently [3][4] is
fixed by a new mechanism that sometimes disables prefetching/the read
stream some time prefetching begins, having already read a small
number of batches with prefetching -- the
INDEX_SCAN_MIN_TUPLE_DISTANCE optimization.

This is also experimental. But it does fully fix the problem at hand,
without any read stream changes. (This is part of the main prefetching
patch.)

This works like the "priorbatch" optimization, but in reverse. We
*unset* the scan's read stream when our INDEX_SCAN_MIN_TUPLE_DISTANCE
test shows that prefetching hasn't worked out (as opposed to delaying
starting it up until it starts to look like prefetching might help).
Like the "priorbatch" optimization, this optimization is concerned
with fixed prefetching costs that cannot possibly pay for themselves.

Note that we originally believed that the regression in question
[3][4] necessitated more work on the read stream side, to directly
account for the way that we saw prefetch distance collapse to 2.0 for
the entire scan. But our current thinking is that the regression in
question occurs with scans where wholly avoiding prefetching is the
right goal. Which is why, tentatively, we're addressing the problem
within indexam.c itself (not in the read stream), by adding this new
INDEX_SCAN_MIN_TUPLE_DISTANCE test to the read stream callback. This
means that various experimental read stream distance patches [3][5]
that initially seemed relevant no longer appear necessary (and so
aren't included in this new revision at all).

Much cleanup work remains to get the changes I just described in
proper shape (to say nothing about open items that we haven't made a
start on yet, like moving the read stream out of indexam.c and into
heapam). But it has been too long since the last revision. I'd like to
establish a regular cadence for posting new revisions of the patch
set.

[1] https://postgr.es/m/6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev
[2] https://postgr.es/m/kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7
[3] https://postgr.es/m/8f5d66cf-44e9-40e0-8349-d5590ba8efb4@vondra.me
[4] https://github.com/tvondra/postgres/blob/index-prefetch-master/microbenchmarks/tomas-weird-issue-readstream.sql
[5] https://postgr.es/m/CA+hUKG+9Qp=E5XWE+_1UPCxULLXz6JrAY=83pmnJ5ifupH-NSA@mail.gmail.com

--
Peter Geoghegan

Attachment Content-Type Size
v20250910-0001-bufmgr-aio-Prototype-for-not-waiting-for-a.patch application/x-patch 6.3 KB
v20250910-0003-Reduce-malloc-free-traffic-by-caching-batc.patch application/x-patch 10.3 KB
v20250910-0002-Add-amgetbatch-interface-for-index-scan-pr.patch application/x-patch 157.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Jacob Champion 2025-09-10 21:46:40 Re: OAuth client code doesn't work with Google OAuth