From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-09-03 19:33:30 |
Message-ID: | CAH2-WznFdjY_OB2S7_BY4iAyeffK+XrE2qsX6aghgP63VocRfQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 3, 2025 at 2:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I still don't think I fully understand why the impact of this is so large. The
> branch misses appear to be the only thing differentiating the two cases, but
> with resowners neutralized, the remaining difference in branch misses seems
> too large - it's not like the sequence of block numbers is more predictable
> without prefetching...
>
> The main increase in branch misses is in index_scan_stream_read_next...
I've been working on fixing the same regressed query, but using a
completely different (though likely complementary) approach: by adding
a test to index_scan_stream_read_next that detects when prefetching
isn't favorable. If it isn't favorable, then we stop prefetching
entirely (we fall back on regular sync I/O).
Although this experimental approach is still very rough, it seems
promising. It ~100% fixes the problem at hand, without really creating
any new problems (at least as far as our testing has been able to
determine, so far).
The key idea is to wait until a few batches have already been read,
and then test whether the index-tuple-wise "distance" between readPos
(the read position) and streamPos (the stream position used by
index_scan_stream_read_next) remained excessively low within
index_scan_stream_read_next. If, after processing 20 batches/leaf
pages, readPos and streamPos still read from the same batch *and* have
a low index-tuple-wise position within that batch (they're within 10
or 20 items of each other), we expect "thrashing", which makes
prefetching unfavorable -- and so we just stop using our read stream.
It's worth noting that (given the current structure of the patch) it
is inherently impossible to do something like this from within the
read stream. We're suppressing duplicate heap block requests iff the
blocks are contiguous within the index. So read stream just doesn't
see anything like what I'm calling the "index-tuple-wise distance"
between readPos and streamPos.
Note that the baseline behavior for the test case (the behavior with
master, or with prefetching disabled) appears to be very I/O bound,
due to readahead. I've confirmed this using iostat. So "synchronous"
I/O isn't very synchronous here. (Prefetching actually does make sense
when this query is run with direct I/O, but that's far slower with or
without the use of explicit prefetching, so that likely doesn't tell
us much.)
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-09-03 19:42:06 | Re: Non-reproducible AIO failure |
Previous Message | Andres Freund | 2025-09-03 19:31:38 | Re: Should io_method=worker remain the default? |