From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-08-14 20:44:14 |
Message-ID: | kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7 |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-08-14 15:45:26 -0400, Peter Geoghegan wrote:
> On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Then why does the exact same pair of runs show "I/O Timings: shared
> > read=194.629" for the sequential table backwards scan (with total
> > execution time 1132.360 ms), versus "I/O Timings: shared read=352.88"
> > (with total execution time 697.681 ms) for the random table backwards
> > scan?
>
> If you're interested in trying this out for yourself, I've pushed my
> working branch here:
>
> https://github.com/petergeoghegan/postgres/tree/index-prefetch-batch-v1.2
>
> Note that the test case you'll run is added by the most recent commit:
>
> https://github.com/petergeoghegan/postgres/commit/c9ceb765f3b138f53b7f1fdf494ba7c816082aa1
>
> Run microbenchmarks/random_backwards_weird.sql to do an initial load
> of both of the tables. Then run
> microbenchmarks/queries_random_backwards_weird.sql to actually run the
> relevant queries. There are 4 such queries, but only the 2 backwards
> scan queries really seem relevant.
Interesting. In the sequential case I see some waits that are not attributed
in explain, due to the waits happening within WaitIO(), not WaitReadBuffers().
Which indicates that the read stream is trying to re-read a buffer that
previously started being read.
read_stream_start_pending_read()
-> StartReadBuffers()
-> AsyncReadBuffers()
-> ReadBuffersCanStartIO()
-> StartBufferIO()
-> WaitIO()
There are far fewer cases of this in the random case.
From what I can tell the sequential case so often will re-read a buffer that
it is already in the process of reading - and thus wait for that IO before
continuing - that we don't actually keep enough IO in flight.
In your email with iostat output you can see that the slow case has
aqu-sz=5.18, while the fast case has aqu-sz=10.06, i.e. the fast case has
twice as much IO in flight. While both have IOs take the same amount of time
(r_await=0.20). Which certainly explains the performance difference...
We can optimize that by deferring the StartBufferIO() if we're encountering a
buffer that is undergoing IO, at the cost of some complexity. I'm not sure
real-world queries will often encounter the pattern of the same block being
read in by a read stream multiple times in close proximity sufficiently often
to make that worth it.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-08-14 21:06:07 | Re: index prefetching |
Previous Message | Masahiko Sawada | 2025-08-14 20:40:25 | Re: POC: Parallel processing of indexes in autovacuum |