pgsql: read_stream: Prevent distance from decaying too quickly

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: read_stream: Prevent distance from decaying too quickly
Date: 2026-04-01 23:54:03
Message-ID: E1w85Nq-002Wi5-1b@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

read_stream: Prevent distance from decaying too quickly

Until now we reduced the look-ahead distance by 1 on every hit, and doubled it
on every miss. That is problematic because there are very common IO patterns
where this prevents us from ever reaching a sufficiently high distance (e.g. a
miss followed by a hit will never have the distance grow beyond 2). In many
such cases, if we had ever reached a sufficient look-ahead distance, things
would have been fine, because we grow the distance faster than we decrease it.

One might think that the most obvious answer to this problem would be to never
reduce the distance. However, that would not work well, as (particularly with
upcoming users of read streams), it is reasonably common to at first have a
lot of misses and then to transition to a fully cached workload, e.g. because
the same blocks are needed repeatedly within one stream. Doing unnecessarily
deep readahead can be costly, due to having to pin a lot more buffers, which
increases CPU overhead.

Because the cost of a synchronously handled miss can be very high (multiple
milliseconds for every IO with commonly used storage) compared to the CPU
overhead of keeping the distance too high, we want to err on the side of not
reducing the distance too early.

The insight that a decrease of the distance by 1 at ever hit may be ok at
large distances, but not at low distances, shows a way out: If we only allow
decreasing the distance once there were no misses for our maximum look-ahead
distance, we will keep the distance high as long as readahead has a chance to
do IO asynchronously, but not commonly when not.

Several folks have written variants of this patch, including at least Thomas
Munro, Melanie Plageman and I.

Reviewed-by: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz%3DkMg3PNay96cHMT0LFwtxP-cQSRZTZzh1Cixxf8G%3Dzrw%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6e36930f9aaffd5e097a01935e6f68ed851535ae

Modified Files
--------------
src/backend/storage/aio/read_stream.c | 36 ++++++++++++++++++++++++++++++++---
1 file changed, 33 insertions(+), 3 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2026-04-02 00:49:26 pgsql: pg_test_timing: Reduce per-loop overhead
Previous Message Andres Freund 2026-04-01 23:36:34 pgsql: read_stream: Issue IO synchronously while in fast path