Quick Links

Re: index prefetching

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: index prefetching
Date:	2025-07-23 12:50:15
Message-ID:	01fa8229-3235-4ce4-85a5-2adbaa90da64@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 7/23/25 02:59, Andres Freund wrote:
> Hi,
>
> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
>> But I don't see why would this have any effect on the prefetch distance,
>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve
>> that. I'd have expected exactly the opposite behavior.
>>
>> Could be bug, of course. But it'd be helpful to see the dataset/query.
>
> Pgbench scale 500, with the simpler query from my message.
>

I tried to reproduce this, but I'm not seeing behavior. I'm not sure how
you monitor the queue depth (presumably iostat?), but I added a basic
prefetch info to explain (see the attached WIP patch), reporting the
average prefetch distance, number of stalls (with distance=0) and stream
resets (after filling INDEX_SCAN_MAX_BATCHES).

And I see this (there's a complete explain output attached) for the two
queries from your message [1]. The

simple query:

SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid
LIMIT 10000000);

complex query:

SELECT max(abalance), min(abalance), sum(abalance::numeric),
avg(abalance::numeric), avg(aid::numeric), avg(bid::numeric) FROM
(SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);

The stats actually look *exactly* the same, which makes sense because
it's reading the same index.

max_batches distance stalls resets stalls/reset
--------------------------------------------------------------------
64 272 3 3 1
32 59 122939 653 188
16 36 108101 1190 90
8 21 98775 2104 46
4 11 95627 4556 20

I think this behavior mostly matches my expectations, although it's
interesting the stalls jump so much between 64 and 32 batches.

I did test both with buffered I/O (io_method=sync) and direct I/O
(io_method=worker), and the results are exactly the same for me. Not the
timings, of course, but the prefetch stats.

Of course, maybe there's something wrong in how the stats are collected.
I wonder if maybe we should update the distance in get_block() and not
in next_buffer().

Or maybe there's some interference from having to read the leaf pages
sooner. But I don't see why that would affect the queue depth, fewer
reset should keep the queues fuller I think.

I'll think about adding some sort of distance histogram to the stats.
Maybe something like tinyhist [2] would work here.

[1]
https://www.postgresql.org/message-id/h2n7d7zb2lbkdcemopvrgmteo35zzi5ljl2jmk32vz5f4pziql%407ppr6r6yfv4z

[2] https://github.com/tvondra/tinyhist

regards

--
Tomas Vondra

Attachment	Content-Type	Size
prefetch-distance-explain.patch	text/x-patch	7.6 KB
prefetch-explains.log	text/x-log	7.7 KB

In response to

Re: index prefetching at 2025-07-23 00:59:07 from Andres Freund

Responses

Re: index prefetching at 2025-07-23 15:09:18 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2025-07-23 15:06:32	Re: Document transition table triggers are not allowed on views/foreign tables
Previous Message	Frédéric Yhuel	2025-07-23 12:45:10	Re: vacuumdb changes for stats import/export