| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2025-12-18 14:40:59 |
| Message-ID: | d8e750ae-67cf-4e21-8ef6-e161656c90c5@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 12/18/25 14:57, Konstantin Knizhnik wrote:
>
> On 17/12/2025 9:54 PM, Tomas Vondra wrote:
>> On 12/17/25 20:30, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-12-17 13:49:43 -0500, Peter Geoghegan wrote:
>>>> On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik
>>>> <knizhnik(at)garret(dot)ru> wrote:
>>>>> Moreover with `enable_indexscan_prefetch=off` results are the same.
>>>> It's quite unlikely that the current heuristics that trigger
>>>> prefetching would have ever allowed any prefetching, for queries such
>>>> as these.
>>>>
>>>> The exact rule right now is that we don't even begin prefetching until
>>>> we've already read at least one index leaf page, and have to read
>>>> another one. So it's impossible to use prefetching with a LIMIT of 1,
>>>> with queries such as these. It's highly unlikely that you'd see any
>>>> benefits from prefetching even with LIMIT 100 (usually we wouldn't
>>>> even begin prefetching).
>>> Note that due to the tuple size and fillfactor in Konstantin's
>>> workload, there
>>> will be one tuple per page... That should allow for some prefetching.
>>>
>> Yes, but that's in the heap. The mechanism Peter described is about leaf
>> pages in the index, and the index has the usual fillfactor. So there'll
>> be many index entries per leaf.
>>
> I slightly change my benchmark setup:
>
> create table t (pk integer, sk integer, payload text default repeat('x',
> 1000)) with (fillfactor=10);
> insert into t values (generate_series(1,10000000),random()*10000000);
> create index on t(sk);
>
> select.sql:
>
> \set sk random(1, 10000000)
> select * from t where sk >= :sk order by sk limit N;
>
> You are right. There is almost no effect of prefetch for limit=100, but
> ~2x times improvement for limit=1000:
>
> eio\limit 1 100 1000
> 10 11102 142 28
> 0 11419 137 14
>
> master:
> limit 1 100 1000
> 11480 130 13
>
> One of the motivation of my experiments was to check that there is no
> degrade of performance because of batching.
> And it is nice that there is no performance penalty here.
> Still it is not quite clear to me why there is no any positive effect
> for LIMIT 100.
The technical reason is that batch_getnext() does this:
/* Delay initializing stream until reading from scan's second batch */
if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled &&
enable_indexscan_prefetch)
scan->xs_heapfetch->rs =
read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
....);
which means we only create the read_stream (which is what enables the
prefetching) only when creating the second batch. And with LIMIT 100 we
likely read just a single leaf page (=batch) most of the time, which
means no read_stream and thus no prefetching.
You can try disabling this "priorbatch" condition, so that the
read_stream gets created right away.
> Reading 100 random heap pages definitely should take advantages of AIO.
> We have also implemented prefetching for index only scan in Neon and
> here effect for similar query is quite noticeable (~3x times).
> But in Neon architecture prices of IO is much higher because requires
> network communication with page server.
>
True, but only if the data is not already in memory / shared buffers.
IIRC this "priorbatch" logic mitigates regressions for cached workloads,
because the read_stream initialization is expensive enough to hurt small
queries when no I/O is needed.
Maybe the tradeoff is different for Neon, which probably can't rely on
cache that much? It's also true tying this to the number of batches is a
bit coarse, because the batch size can vary a lot. It can be a couple
items or hundreds of items, easily.
I believe we're open to alternative ideas.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2025-12-18 14:45:44 | Re: index prefetching |
| Previous Message | Robert Haas | 2025-12-18 14:26:31 | Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables |