From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-07-23 12:50:15 |
Message-ID: | 01fa8229-3235-4ce4-85a5-2adbaa90da64@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/23/25 02:59, Andres Freund wrote:
> Hi,
>
> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
>> But I don't see why would this have any effect on the prefetch distance,
>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve
>> that. I'd have expected exactly the opposite behavior.
>>
>> Could be bug, of course. But it'd be helpful to see the dataset/query.
>
> Pgbench scale 500, with the simpler query from my message.
>
I tried to reproduce this, but I'm not seeing behavior. I'm not sure how
you monitor the queue depth (presumably iostat?), but I added a basic
prefetch info to explain (see the attached WIP patch), reporting the
average prefetch distance, number of stalls (with distance=0) and stream
resets (after filling INDEX_SCAN_MAX_BATCHES).
And I see this (there's a complete explain output attached) for the two
queries from your message [1]. The
simple query:
SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid
LIMIT 10000000);
complex query:
SELECT max(abalance), min(abalance), sum(abalance::numeric),
avg(abalance::numeric), avg(aid::numeric), avg(bid::numeric) FROM
(SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);
The stats actually look *exactly* the same, which makes sense because
it's reading the same index.
max_batches distance stalls resets stalls/reset
--------------------------------------------------------------------
64 272 3 3 1
32 59 122939 653 188
16 36 108101 1190 90
8 21 98775 2104 46
4 11 95627 4556 20
I think this behavior mostly matches my expectations, although it's
interesting the stalls jump so much between 64 and 32 batches.
I did test both with buffered I/O (io_method=sync) and direct I/O
(io_method=worker), and the results are exactly the same for me. Not the
timings, of course, but the prefetch stats.
Of course, maybe there's something wrong in how the stats are collected.
I wonder if maybe we should update the distance in get_block() and not
in next_buffer().
Or maybe there's some interference from having to read the leaf pages
sooner. But I don't see why that would affect the queue depth, fewer
reset should keep the queues fuller I think.
I'll think about adding some sort of distance histogram to the stats.
Maybe something like tinyhist [2] would work here.
[2] https://github.com/tvondra/tinyhist
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
prefetch-distance-explain.patch | text/x-patch | 7.6 KB |
prefetch-explains.log | text/x-log | 7.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2025-07-23 15:06:32 | Re: Document transition table triggers are not allowed on views/foreign tables |
Previous Message | Frédéric Yhuel | 2025-07-23 12:45:10 | Re: vacuumdb changes for stats import/export |