From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-07-16 19:39:58 |
Message-ID: | obwrjsclfm2dcymo6izs3dz55h364nifkcqof7hjw4fmw3kdqe@yddoyzbrizly |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 2:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Could you share the current version of the complex patch (happy with a git
> > tree)? Afaict it hasn't been posted, which makes this pretty hard follow along
> > / provide feedback on, for others.
>
> Sure:
>
> https://github.com/petergeoghegan/postgres/tree/index-prefetch-2025-pg-revisions-v0.11
>
> I think that the version that Tomas must have used is a few days old,
> and might be a tiny bit different. But I don't think that that's
> likely to matter, especially not if you just want to get the general
> idea.
As a first thing I just wanted to get a feel for the improvements we can get.
I had a scale 5 tpch already loaded, so I ran a bogus query on that to see.
The improvement with either of the patchsets with a quick trial query is
rather impressive when using direct IO (presumably also with an empty cache,
but DIO is more predictable).
As Peter's branch doesn't seem to have an enable_* GUC, I used
SET effective_io_concurrency=0 to test the non-prefetching results (and
verified with master that the results are similar).
Test:
Peter's:
Without prefetching:
SET effective_io_concurrency=0;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.611..957.874 rows=10000.00 loops=1) │
│ Buffers: shared hit=1213 read=8626 │
│ I/O Timings: shared read=943.344 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.611..956.593 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1213 read=8626 │
│ I/O Timings: shared read=943.344 │
│ Planning Time: 0.083 ms │
│ Execution Time: 958.508 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
With prefetching:
SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.497..67.737 rows=10000.00 loops=1) │
│ Buffers: shared hit=1227 read=8667 │
│ I/O Timings: shared read=48.473 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.496..66.471 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1227 read=8667 │
│ I/O Timings: shared read=48.473 │
│ Planning Time: 0.090 ms │
│ Execution Time: 68.965 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
Tomas':
With prefetching:
SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.278..70.609 rows=10000.00 loops=1) │
│ Buffers: shared hit=1227 read=8668 │
│ I/O Timings: shared read=52.578 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.277..69.304 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1227 read=8668 │
│ I/O Timings: shared read=52.578 │
│ Planning Time: 0.072 ms │
│ Execution Time: 71.549 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)
The wins are similar without DIO and a cold OS cache, but i don't like
emptying out the entire OS cache all the time...
I call that a hell of an impressive improvement with either patch - it's
really really hard to find order of magnitude improvements in anything close
to realistic cases.
And that's on a local reasonably fast NVMe - with networked storage we'll see
much bigger wins.
This also doesn't just repro with toy queries, e.g. TPCH Q02 shows a 2X
improvement too (with either patch) - the only reason it's not bigger is that
all the remaining IO time is on the inner side of a nestloop that isn't
currently prefetchable.
Peter, it'd be rather useful if your patch also had an enable/disable GUC,
otherwise it's more work to study the performance effects. The
effective_io_concurrency approach isn't great, because it also affects
bitmap scans, seqscans etc.
Just playing around, there are many cases where there is effectively no
difference between the two approaches, from a runtime perspective. There,
unsurprisingly, are some where the complex approach clearly wins, mostly
around IN(list-of-constants) so far.
Looking at the actual patches now.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | DINESH NAIR | 2025-07-16 20:10:16 | Re: Composite types for updatable views |
Previous Message | Tom Lane | 2025-07-16 19:36:35 | Re: Fix PQport to never return NULL if the connection is valid |