Quick Links

Re: index prefetching

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: index prefetching
Date:	2025-08-25 17:50:27
Message-ID:	99028cb4-2782-43fe-b7aa-590b9692b040@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 8/25/25 17:43, Thomas Munro wrote:
> On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> Of course, this can happen even with other hit ratios, there's nothing
>> special about 50%.
>
> Right, that's what this patch was attacking directly, basically only
> giving up when misses are so sparse we can't do anything about it for
> an ordered stream:
>
> https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw%40mail.gmail.com
>
> aio: Improve read_stream.c look-ahead heuristics C
>
> Previously we would reduce the look-ahead distance by one every time we
> got a cache hit, which sometimes performed poorly with mixed hit/miss
> patterns, especially if it was trapped at one.
>
> Instead, sustain the current distance until we've seen evidence that
> there is no window big enough to span the gap between rare IOs. In
> other words, we now use information from a much larger window to
> estimate the utility of looking far ahead.

Ah, I forgot about this patch.

There's been too many PoC / experimental patches with read_stream
improvements, I'm loosing track of them. I'm ready to do some
evaluation, but it's not clear which ones to evaluate, etc. Could you
maybe consolidate them into a patch series that I could benchmark?

I did give this patch a try with the dataset/query shared in [1], and
the explain looks like this:

QUERY PLAN
---------------------------------------------------------------------
Index Scan using idx on t (actual rows=9048576.00 loops=1)
Index Cond: ((a >= 16150) AND (a <= 4540437))
Index Searches: 1
Prefetch Distance: 271.999
Prefetch Count: 4339129
Prefetch Stalls: 386
Prefetch Skips: 6039906
Prefetch Resets: 0
Stream Ungets: 1331122
Stream Forwarded: 306719
Prefetch Histogram: [2,4) => 10, [4,8) => 2, [8,16) => 2,
[16,32) => 2, [32,64) => 2, [64,128) => 3,
[256,512) => 4339108
Buffers: shared hit=2573920 read=455610
Planning:
Buffers: shared hit=83 read=26
Planning Time: 4.142 ms
Execution Time: 1694.368 ms
(16 rows)

which is pretty good, and pretty much on-par with master (so no
regression, which is good).

It's a bit strange the distance ends up being that high, though. The
explain says:

Prefetch Distance: 271.999

There's ~70% misses on average, so isn't 217 a bit too high? Wouldn't
that cause too many concurrent IOs? Maybe I'm interpreting this wrong,
or maybe the explain stats are not quite right.

For comparison, the patch from [1] ends up with this:

Prefetch Distance: 36.321

In any case, the patch seems to help, and maybe it's a better approach,
I need to take a closer look.

regards

[1]
https://www.postgresql.org/message-id/8f5d66cf-44e9-40e0-8349-d5590ba8efb4%40vondra.me

--
Tomas Vondra

In response to

Re: index prefetching at 2025-08-25 15:43:04 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2025-08-25 17:57:43	Re: index prefetching
Previous Message	Antonin Houska	2025-08-25 17:22:14	Re: Adding REPACK [concurrently]