From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-08-25 17:50:27 |
Message-ID: | 99028cb4-2782-43fe-b7aa-590b9692b040@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/25/25 17:43, Thomas Munro wrote:
> On Tue, Aug 26, 2025 at 2:18 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> Of course, this can happen even with other hit ratios, there's nothing
>> special about 50%.
>
> Right, that's what this patch was attacking directly, basically only
> giving up when misses are so sparse we can't do anything about it for
> an ordered stream:
>
> https://www.postgresql.org/message-id/CA%2BhUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw%40mail.gmail.com
>
> aio: Improve read_stream.c look-ahead heuristics C
>
> Previously we would reduce the look-ahead distance by one every time we
> got a cache hit, which sometimes performed poorly with mixed hit/miss
> patterns, especially if it was trapped at one.
>
> Instead, sustain the current distance until we've seen evidence that
> there is no window big enough to span the gap between rare IOs. In
> other words, we now use information from a much larger window to
> estimate the utility of looking far ahead.
Ah, I forgot about this patch.
There's been too many PoC / experimental patches with read_stream
improvements, I'm loosing track of them. I'm ready to do some
evaluation, but it's not clear which ones to evaluate, etc. Could you
maybe consolidate them into a patch series that I could benchmark?
I did give this patch a try with the dataset/query shared in [1], and
the explain looks like this:
QUERY PLAN
---------------------------------------------------------------------
Index Scan using idx on t (actual rows=9048576.00 loops=1)
Index Cond: ((a >= 16150) AND (a <= 4540437))
Index Searches: 1
Prefetch Distance: 271.999
Prefetch Count: 4339129
Prefetch Stalls: 386
Prefetch Skips: 6039906
Prefetch Resets: 0
Stream Ungets: 1331122
Stream Forwarded: 306719
Prefetch Histogram: [2,4) => 10, [4,8) => 2, [8,16) => 2,
[16,32) => 2, [32,64) => 2, [64,128) => 3,
[256,512) => 4339108
Buffers: shared hit=2573920 read=455610
Planning:
Buffers: shared hit=83 read=26
Planning Time: 4.142 ms
Execution Time: 1694.368 ms
(16 rows)
which is pretty good, and pretty much on-par with master (so no
regression, which is good).
It's a bit strange the distance ends up being that high, though. The
explain says:
Prefetch Distance: 271.999
There's ~70% misses on average, so isn't 217 a bit too high? Wouldn't
that cause too many concurrent IOs? Maybe I'm interpreting this wrong,
or maybe the explain stats are not quite right.
For comparison, the patch from [1] ends up with this:
Prefetch Distance: 36.321
In any case, the patch seems to help, and maybe it's a better approach,
I need to take a closer look.
regards
[1]
https://www.postgresql.org/message-id/8f5d66cf-44e9-40e0-8349-d5590ba8efb4%40vondra.me
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-08-25 17:57:43 | Re: index prefetching |
Previous Message | Antonin Houska | 2025-08-25 17:22:14 | Re: Adding REPACK [concurrently] |