Re: index prefetching

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-08-17 17:30:14
Message-ID: CAH2-Wz=nPRkQoURxM+4LnAJvpdFX4sfkSEsHBr19XQoM0Rdn_g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> As far as I know, we only have the following unambiguous performance
> regressions (that clearly need to be fixed):
>
> 1. This issue.
>
> 2. There's about a 3% loss of throughput on pgbench SELECT.

Update: I managed to fix the performance regression with pgbench
SELECT (regression 2). Since Andres' patch fixes the other regression
(regression 1), we no longer have any known performance regression
(though I don't doubt that they still exist somewhere). I've also
added back the enable_indexscan_prefetch testing GUC (Andres asked me
to do that a few weeks back). If you set
enable_indexscan_prefetch=false, btgetbatch performance is virtually
identical to master/btgettuple.

A working copy of the patchset with these revisions is available from:
https://github.com/petergeoghegan/postgres/tree/index-prefetch-batch-v1.6

The solution to the pgbench issue was surprisingly straightforward.
Profiling showed that the regression was caused by the added overhead
of using the read stream, for queries where prefetching cannot
possibly help -- such small startup costs are relatively noticeable
with pgbench's highly selective scans. It turns out that it's possible
to initially avoid using a read stream, while still retaining the
option of switching over to using a read stream later on. The trick to
fixing the pgbench issue was delaying creating a read stream for long
enough for the pgbench queries to never need to create one, without
that impacting queries that at least have some chance of benefiting
from prefetching.

The actual heuristic I'm using to decide when to start the read stream
is simple: only start a read stream right after the scan's second
batch is returned by amgetbatch, but before we've fetched any heap
blocks related to that second batch (start using a read stream when
fetching new heap blocks from that second batch). It's possible that
that heuristic isn't sophisticated enough for other types of queries.
But either way the basic structure within indexam.c places no
restrictions on when we start a read stream. It doesn't have to be
aligned with amgetbatch-wise batch boundaries, for example (I just
found that structure convenient).

I haven't spent much time testing this change, but it appears to work
perfectly (no pgbench regressions, but also no regressions in queries
that were already seeing significant benefits from prefetching). I'd
feel better about all this if we had better testing of the read stream
invariants by (say) adding assertions to index_scan_stream_read_next,
the read stream callback. And just having comments that explain those
invariants.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marthin Laubscher 2025-08-17 21:11:06 Re: About Custom Aggregates, C Extensions and Memory
Previous Message Jelte Fennema-Nio 2025-08-17 17:12:51 Re: [PATCH] GROUP BY ALL