Quick Links

Re: index prefetching

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: index prefetching
Date:	2026-02-17 21:36:53
Message-ID:	720560ca-97b9-40a1-ad40-9f9b8a6648e9@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2/17/26 21:16, Peter Geoghegan wrote:
> On Tue, Feb 17, 2026 at 2:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2026-02-17 12:16:23 -0500, Peter Geoghegan wrote:
>>> On Mon, Feb 16, 2026 at 11:48 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> I agree that the current heuristics (which were invented recently) are
>>> too conservative. I overfit the heuristics to my current set of
>>> adversarial queries, as a stopgap measure.
>>
>> Are you doing any testing on higher latency storage? I found it to be quite
>> valuable to use dm_delay to have a disk with reproducible (i.e. not cloud)
>> higher latency (i.e. not just a local SSD).
>
> I sometimes use dm_delay (with the minimum 1ms delay) when testing,
> but don't do so regularly. Just because it's inconvenient to do so
> (perhaps not a great reason).
>
>> Low latency NVMe can reduce the
>> penalty of not enough readahead so much that it's hard to spot problems...
>
> I'll keep that in mind.
>

So, what counts as "higher latency" in this context? What delays should
we consider practical/relevant for testing?

>>> ISTM that we need the yields to better cooperate with whatever's
>>> happening on the read stream side.
>>
>> Plausible. It could be that we could get away with controlling the rampup to
>> be slower in potentially problematic cases, without needing the yielding, but
>> not sure.
>>
>> If that doesn't work, it might just be sufficient to increase the number of
>> batches that trigger yields as the scan goes on (perhaps by taking the number
>> of already "consumed" batches into account).
>
> It could make sense to take the number of consumed batches into
> account. In general, I think the best approach will be one that
> combines multiple complementary strategies.
>

Yes, this is roughly what I meant by "ramp up". Start by limiting the
batch distance to 2, then gradually increase that during the scan.

> Passing down a LIMIT N hint has proven to be a good idea -- and it
> doesn't really require applying any information related to the read
> stream. That's enough to prevent problems in the really extreme cases
> (e.g., nested loop antijoins with a LIMIT 1 on the inner side). The
> problematic merge join I showed you is a not-so-extreme case, which
> makes it trickier. ISTM that taking into consideration the number of
> "consumed" batches will not help that particular merge join query,
> precisely because it's not-so-extreme: the inner index scan consumes
> plenty of batches, but is nevertheless significantly regressed (at
> least when we don't yield at all).
>
>> To evaluate the amount of wasted work, it could be useful to make the read
>> stream stats page spit out the amount of "unconsumed" IOs at the end of the
>> scan.
>
> That would make sense. You can already tell when that's happened by
> comparing the details shown by EXPLAIN ANALYZE against the same query
> execution on master, but that approach is inconvenient. Automating my
> microbenchmarks has proven to be important with this project. There's
> quite a few competing considerations, and it's too easy to improve one
> query at the cost of regressing another.
>

What counts as "unconsumed IO"? The IOs the stream already started, but
then did not consume? That shouldn't be hard, I think.

regards

--
Tomas Vondra

In response to

Re: index prefetching at 2026-02-17 20:16:06 from Peter Geoghegan

Responses

Re: index prefetching at 2026-02-18 04:21:28 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2026-02-17 21:56:11	Re: BackgroundPsql swallowing errors on windows
Previous Message	Peter Eisentraut	2026-02-17 21:36:06	Re: AIX support