From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-07-23 16:55:02 |
Message-ID: | cffa6e3e-f9c4-468f-b218-a3a5e172c226@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/23/25 17:09, Andres Freund wrote:
> Hi,
>
> On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote:
>> On 7/23/25 02:59, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
>>>> But I don't see why would this have any effect on the prefetch distance,
>>>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve
>>>> that. I'd have expected exactly the opposite behavior.
>>>>
>>>> Could be bug, of course. But it'd be helpful to see the dataset/query.
>>>
>>> Pgbench scale 500, with the simpler query from my message.
>>>
>>
>> I tried to reproduce this, but I'm not seeing behavior. I'm not sure how
>> you monitor the queue depth (presumably iostat?)
>
> Yes, iostat, since I was looking at what the "actually required" lookahead
> distance is.
>
> Do you actually get the query to be entirely CPU bound? What amount of IO
> waiting do you see EXPLAIN (ANALYZE, TIMING OFF) with track_io_timing=on
> report?
>
No, it definitely needs to wait for I/O (FWIW it's on the xeon, with a
single NVMe SSD).
> Ah - I was using a very high effective_io_concurrency. With a high
> effective_io_concurrency value I see a lot of stalls, even at
> INDEX_SCAN_MAX_BATCHES = 64. And a lower prefetch distance, which seems
> somewhat odd.
>
I think that's a bug in the explain patch. The counters were updated at
the beginning of _next_buffer(), but that's wrong - a single call to
_next_buffer() can prefetch multiple blocks. This skewed the stats, as
the prefetches are not counted with "distance=0". With higher eic this
happens sooner, so the average distance seemed to decrease.
The attached patch does the updates in _get_block(), which I think is
better. And "stall" now means (distance == 1), which I think detects
requests without prefetching.
I also added a separate "Count" for the actual number of prefetched
blocks, and "Skipped" for duplicate blocks skipped (which the read
stream never even sees, because it's skipped in the callback).
>
> FWIW, in my tests I was just evicting lineitem from shared buffers, since I
> wanted to test the heap prefetching, without stalls induced by blocking on
> index reads. But what I described happens with either.
>
> ;SET effective_io_concurrency = 256;SELECT pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs off, timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN │
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Aggregate (actual rows=1.00 loops=1) │
> │ Buffers: shared hit=27369 read=164191 │
> │ I/O Timings: shared read=358.795 │
> │ -> Limit (actual rows=10000000.00 loops=1) │
> │ Buffers: shared hit=27369 read=164191 │
> │ I/O Timings: shared read=358.795 │
> │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) │
> │ Index Searches: 1 │
> │ Prefetch Distance: 256.989 │
> │ Prefetch Stalls: 3 │
> │ Prefetch Resets: 3 │
> │ Buffers: shared hit=27369 read=164191 │
> │ I/O Timings: shared read=358.795 │
> │ Planning Time: 0.086 ms │
> │ Execution Time: 4194.845 ms │
> └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
>
> ;SET effective_io_concurrency = 512;SELECT pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs off, timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN │
> ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ Aggregate (actual rows=1.00 loops=1) │
> │ Buffers: shared hit=27368 read=164190 │
> │ I/O Timings: shared read=832.515 │
> │ -> Limit (actual rows=10000000.00 loops=1) │
> │ Buffers: shared hit=27368 read=164190 │
> │ I/O Timings: shared read=832.515 │
> │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) │
> │ Index Searches: 1 │
> │ Prefetch Distance: 56.778 │
> │ Prefetch Stalls: 160569 │
> │ Prefetch Resets: 423 │
> │ Buffers: shared hit=27368 read=164190 │
> │ I/O Timings: shared read=832.515 │
> │ Planning Time: 0.084 ms │
> │ Execution Time: 4413.058 ms │
> └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
>
> Greetings,
>
The attached v2 explain patch should fix that. I'm also attaching logs
from my explain, for 64 and 16 batches. I think the output makes much
more sense now.
cheers
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
batches-64.log | text/x-log | 14.9 KB |
batches-16.log | text/x-log | 17.7 KB |
prefetch-distance-explain-v2.patch | text/x-patch | 8.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-07-23 17:07:00 | Re: Verify predefined LWLocks tranches have entries in wait_event_names.txt |
Previous Message | Nathan Bossart | 2025-07-23 16:54:52 | Re: trivial grammar refactor |