Quick Links

Re: index prefetching

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: index prefetching
Date:	2025-12-18 14:40:59
Message-ID:	d8e750ae-67cf-4e21-8ef6-e161656c90c5@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/18/25 14:57, Konstantin Knizhnik wrote:
>
> On 17/12/2025 9:54 PM, Tomas Vondra wrote:
>> On 12/17/25 20:30, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-12-17 13:49:43 -0500, Peter Geoghegan wrote:
>>>> On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik
>>>> <knizhnik(at)garret(dot)ru> wrote:
>>>>> Moreover with `enable_indexscan_prefetch=off` results are the same.
>>>> It's quite unlikely that the current heuristics that trigger
>>>> prefetching would have ever allowed any prefetching, for queries such
>>>> as these.
>>>>
>>>> The exact rule right now is that we don't even begin prefetching until
>>>> we've already read at least one index leaf page, and have to read
>>>> another one. So it's impossible to use prefetching with a LIMIT of 1,
>>>> with queries such as these. It's highly unlikely that you'd see any
>>>> benefits from prefetching even with LIMIT 100 (usually we wouldn't
>>>> even begin prefetching).
>>> Note that due to the tuple size and fillfactor in Konstantin's
>>> workload, there
>>> will be one tuple per page... That should allow for some prefetching.
>>>
>> Yes, but that's in the heap. The mechanism Peter described is about leaf
>> pages in the index, and the index has the usual fillfactor. So there'll
>> be many index entries per leaf.
>>
> I slightly change my benchmark setup:
>
> create table t (pk integer, sk integer, payload text default repeat('x',
> 1000)) with (fillfactor=10);
> insert into t values (generate_series(1,10000000),random()*10000000);
> create index on t(sk);
>
> select.sql:
>
> \set sk random(1, 10000000)
> select * from t where sk >= :sk order by sk limit N;
>
> You are right. There is almost no effect of prefetch for limit=100, but
> ~2x times improvement for limit=1000:
>
> eio\limit 1 100 1000
> 10 11102 142 28
> 0 11419 137 14
>
> master:
> limit 1 100 1000
> 11480 130 13
>
> One of the motivation of my experiments was to check that there is no
> degrade of performance because of batching.
> And it is nice that there is no performance penalty here.
> Still it is not quite clear to me why there is no any positive effect
> for LIMIT 100.

The technical reason is that batch_getnext() does this:

/* Delay initializing stream until reading from scan's second batch */
if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled &&
enable_indexscan_prefetch)
scan->xs_heapfetch->rs =
read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
....);

which means we only create the read_stream (which is what enables the
prefetching) only when creating the second batch. And with LIMIT 100 we
likely read just a single leaf page (=batch) most of the time, which
means no read_stream and thus no prefetching.

You can try disabling this "priorbatch" condition, so that the
read_stream gets created right away.

> Reading 100 random heap pages definitely should take advantages of AIO.
> We have also implemented prefetching for index only scan in Neon and
> here effect for similar query is quite noticeable (~3x times).
> But in Neon architecture prices of IO is much higher because requires
> network communication with page server.
>

True, but only if the data is not already in memory / shared buffers.
IIRC this "priorbatch" logic mitigates regressions for cached workloads,
because the read_stream initialization is expensive enough to hurt small
queries when no I/O is needed.

Maybe the tradeoff is different for Neon, which probably can't rely on
cache that much? It's also true tying this to the number of batches is a
bit coarse, because the batch size can vary a lot. It can be a couple
items or hundreds of items, easily.

I believe we're open to alternative ideas.

regards

--
Tomas Vondra

In response to

Re: index prefetching at 2025-12-18 13:57:40 from Konstantin Knizhnik

Responses

Re: index prefetching at 2025-12-18 14:45:44 from Andres Freund
Re: index prefetching at 2025-12-18 16:35:55 from Konstantin Knizhnik

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-12-18 14:45:44	Re: index prefetching
Previous Message	Robert Haas	2025-12-18 14:26:31	Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables