| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2026-02-27 04:18:03 |
| Message-ID: | issqornf6vdn3vb64fjuoathypmu3e5pgputd3lpfuvoeqyvzr@qfordnhplp2v |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-02-24 13:13:25 -0500, Peter Geoghegan wrote:
> > Plausible. It could be that we could get away with controlling the rampup to
> > be slower in potentially problematic cases, without needing the yielding, but
> > not sure.
>
> Attached is v11, which makes the read stream yielding mechanism better
> cooperate with index prefetching, so as to avoid interefering with
> io_combine_limit. This should deal with the odd performance that you
> complained about. See
> v11-0006-Introduce-read_stream_-pause-resume-yield.patch (and the
> later prefetching patch
> v11-0007-Add-heapam-index-scan-I-O-prefetching.patch) for details.
>
> The whole idea of measuring "batch distance" is gone in this version,
> though we do still only consider whether now is a good time to yield
> at "batch boundaries". We always refuse yield on the first few batches
> of the scan, so the idea of caring about batch boundaries is still
> there, albeit in a much more limited form.
I'm planning to do some reviewing in the next days. In preparation I just
retried a benchmark and saw some odd results. After a while I was able to
reproduce even with a simpler setup:
-c shared_buffers=2GB -c debug_io_direct=data -c io_method=io_uring
pgbench -i -q -s 100 --fillfactor=90
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..441511.11 rows=10000045 width=97) (actual time=0.308..6101.837 rows=10000000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=27325 read=181819 │
│ I/O Timings: shared read=4538.003 │
│ Planning Time: 0.041 ms │
│ Execution Time: 6433.192 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
pgbench -i -q -s 100 --fillfactor=50
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..593022.41 rows=9999798 width=97) (actual time=0.131..3973.698 rows=10000000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=19239 read=341420 │
│ I/O Timings: shared read=1752.057 │
│ Planning: │
│ Buffers: shared hit=42 read=15 │
│ Planning Time: 1.668 ms │
│ Execution Time: 4308.182 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
pgbench -i -q -s 100 --fillfactor=25
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..926358.51 rows=10000005 width=97) (actual time=0.112..3259.362 rows=10000000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=9610 read=684382 │
│ I/O Timings: shared read=242.259 │
│ Planning: │
│ Buffers: shared hit=18 │
│ Planning Time: 0.097 ms │
│ Execution Time: 3594.782 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Note how the increase in scanned heap pages actually *decreases* the overall
time rather substantially.
It's quite visible, both in iostat, and a query like
SELECT pid, target_desc, off, length FROM pg_aios \watch 0.5
that for the first query has basically no IO concurrency, the second has very
intermittent IO concurrency and the third one has nice IO concurrency.
If I disable the yield logic, the fillfactor=90 case is good:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..441511.11 rows=10000045 width=97) (actual time=0.470..1662.331 rows=10000000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=27325 read=181819 │
│ I/O Timings: shared read=21.113 │
│ Planning Time: 0.043 ms │
│ Execution Time: 1995.723 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Of course this is a silly query, but you'd also see that with a mergejoin or
such.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-02-27 04:25:24 | Re: Defects with invalid stats data for expressions in extended stats |
| Previous Message | Corey Huinker | 2026-02-27 03:52:48 | Re: Defects with invalid stats data for expressions in extended stats |