Re: index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2025-07-16 19:39:58
Message-ID: obwrjsclfm2dcymo6izs3dz55h364nifkcqof7hjw4fmw3kdqe@yddoyzbrizly
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 2:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Could you share the current version of the complex patch (happy with a git
> > tree)? Afaict it hasn't been posted, which makes this pretty hard follow along
> > / provide feedback on, for others.
>
> Sure:
>
> https://github.com/petergeoghegan/postgres/tree/index-prefetch-2025-pg-revisions-v0.11
>
> I think that the version that Tomas must have used is a few days old,
> and might be a tiny bit different. But I don't think that that's
> likely to matter, especially not if you just want to get the general
> idea.

As a first thing I just wanted to get a feel for the improvements we can get.
I had a scale 5 tpch already loaded, so I ran a bogus query on that to see.

The improvement with either of the patchsets with a quick trial query is
rather impressive when using direct IO (presumably also with an empty cache,
but DIO is more predictable).

As Peter's branch doesn't seem to have an enable_* GUC, I used
SET effective_io_concurrency=0 to test the non-prefetching results (and
verified with master that the results are similar).

Test:

Peter's:

Without prefetching:

SET effective_io_concurrency=0;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.611..957.874 rows=10000.00 loops=1) │
│ Buffers: shared hit=1213 read=8626 │
│ I/O Timings: shared read=943.344 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.611..956.593 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1213 read=8626 │
│ I/O Timings: shared read=943.344 │
│ Planning Time: 0.083 ms │
│ Execution Time: 958.508 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

With prefetching:

SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.497..67.737 rows=10000.00 loops=1) │
│ Buffers: shared hit=1227 read=8667 │
│ I/O Timings: shared read=48.473 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.496..66.471 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1227 read=8667 │
│ I/O Timings: shared read=48.473 │
│ Planning Time: 0.090 ms │
│ Execution Time: 68.965 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Tomas':

With prefetching:

SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem ORDER BY l_shipdate LIMIT 10000;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.278..70.609 rows=10000.00 loops=1) │
│ Buffers: shared hit=1227 read=8668 │
│ I/O Timings: shared read=52.578 │
│ -> Index Scan using i_l_shipdate on lineitem (cost=0.44..6994824.33 rows=29999796 width=106) (actual time=0.277..69.304 rows=10000.00 loops=1) │
│ Index Searches: 1 │
│ Buffers: shared hit=1227 read=8668 │
│ I/O Timings: shared read=52.578 │
│ Planning Time: 0.072 ms │
│ Execution Time: 71.549 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

The wins are similar without DIO and a cold OS cache, but i don't like
emptying out the entire OS cache all the time...

I call that a hell of an impressive improvement with either patch - it's
really really hard to find order of magnitude improvements in anything close
to realistic cases.

And that's on a local reasonably fast NVMe - with networked storage we'll see
much bigger wins.

This also doesn't just repro with toy queries, e.g. TPCH Q02 shows a 2X
improvement too (with either patch) - the only reason it's not bigger is that
all the remaining IO time is on the inner side of a nestloop that isn't
currently prefetchable.

Peter, it'd be rather useful if your patch also had an enable/disable GUC,
otherwise it's more work to study the performance effects. The
effective_io_concurrency approach isn't great, because it also affects
bitmap scans, seqscans etc.

Just playing around, there are many cases where there is effectively no
difference between the two approaches, from a runtime perspective. There,
unsurprisingly, are some where the complex approach clearly wins, mostly
around IN(list-of-constants) so far.

Looking at the actual patches now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message DINESH NAIR 2025-07-16 20:10:16 Re: Composite types for updatable views
Previous Message Tom Lane 2025-07-16 19:36:35 Re: Fix PQport to never return NULL if the connection is valid