Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2026-04-03 05:17:14
Message-ID: CAH2-WzkiCK=wELiXPgriN4r7cJzGb3Xg48E9YHrFEyEPTkynOw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2026 at 6:50 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v20, which does things this way. Now it's completely clear
> that the scan's batch ring buffer is fully controlled by the table AM.

Attached is v21, which advances things in the direction established in
v20. It also fixes bitrot -- we no longer carry certain read stream
related patches from Andres that were committed since I posted v20 a
couple of days ago.

> There are now only 2 remaining indexbatch.c functions that get called
> from indexam.c. Since these are both trivial, highly generic
> functions, restructuring things so that heapam would call these two
> seemed unnecessary.

I changed my mind about this: I now think it's better if indexam.c
never directly calls indexbatch.c. Just for flexibility and
extensibility. That's how it's done in v21.

In v21, the code in indexbatch.c is about 80% table AM code, 20% index
AM code -- nothing more. That seems a bit cleaner to me, because the
table AM knowing that the batchringbuf must have already been
initialized on its behalf is no longer requires (nothing like that is
required). The table AM fully owns batchringbuf, and initializes it
for itself where needed.

This means I've had to add a new table AM callback to take a mark,
which isn't strictly necessary for heapam -- the heapam implementation
just calls the relevant indexbatch.c routine, which in practice
handles everything that's needed by heapam on its own. My guess is
that Andres will slightly prefer things that way, since of course it's
possible that some external table AM will need non-trivial code just
to take a mark (restoring a mark has to be a bit more complicated to
account for the read stream, which is why v20 already had one).

We're losing 2 callbacks from the index AM API (for mark/restore), but
gaining 2 very similar ones in the table AM API.

We now expect the table AM's index_fetch_begin callback to set
IndexScanDescData.xs_getnext_slot for the scan itself -- another thing
that happens in the table AM rather than in indexam.c in v21. That way
we don't need 4 extra TableAmRoutine callbacks (one for every
combination of amgettuple, amgetbatch, index-only, plain).

I don't think requiring exactly 4 slot-based variants set within
TableAmRoutine, as was the case with the last several patch versions,
was sensible. In v21, the table AM can provide as few or as many
callbacks as it likes (obviously at least one callback must exist).
This works fine, provided the table AM is sure that the callback will
be correct for the duration of that particular scan, based on what is
known about the scan when it begins. (In practice this means the table
AM exploiting knowledge of whether the scan is index-only or plain, or
whether it's amgettuple or amgetbatch -- but other variations might
make sense in the future.)

--
Peter Geoghegan

Attachment Content-Type Size
v21-0012-Hacky-implementation-of-making-read_stream_reset.patch application/octet-stream 5.1 KB
v21-0015-WIP-aio-bufmgr-Fix-race-condition-leading-to-dea.patch application/octet-stream 3.1 KB
v21-0013-WIP-read-stream-Split-decision-about-look-ahead-.patch application/octet-stream 14.8 KB
v21-0001-Rename-heapam_index_fetch_tuple-argument-for-cla.patch application/octet-stream 2.6 KB
v21-0014-aio-Fix-pgaio_io_wait-for-staged-IOs-B.patch application/octet-stream 6.3 KB
v21-0011-read_stream-Only-increase-distance-when-waiting-.patch application/octet-stream 4.2 KB
v21-0009-Make-hash-index-AM-use-amgetbatch-interface.patch application/octet-stream 47.3 KB
v21-0008-heapam-Add-index-scan-I-O-prefetching.patch application/octet-stream 46.1 KB
v21-0010-aio-io_uring-Trigger-async-processing-for-large-.patch application/octet-stream 7.2 KB
v21-0007-heapam-Optimize-pin-transfers-during-index-scans.patch application/octet-stream 6.7 KB
v21-0006-Add-interfaces-that-enable-index-prefetching.patch application/octet-stream 239.4 KB
v21-0003-Add-slot-based-table-AM-index-scan-interface.patch application/octet-stream 78.4 KB
v21-0004-heapam-Track-heap-block-in-IndexFetchHeapData.patch application/octet-stream 4.7 KB
v21-0005-heapam-Keep-buffer-pins-across-index-rescans.patch application/octet-stream 3.9 KB
v21-0002-Move-heapam_handler.c-index-scan-code-to-new-fil.patch application/octet-stream 21.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-04-03 05:37:03 Re: SQL/PGQ: All properties reference
Previous Message Pavel Stehule 2026-04-03 05:15:41 Re: proposal: schema variables