Re: AIO / read stream heuristics adjustments for index prefetching

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: AIO / read stream heuristics adjustments for index prefetching
Date: 2026-03-31 19:31:39
Message-ID: CAAKRu_YvSjYvKtsTsRqm0=Jy0QhpUTsccRd66BsSKrdN-7ZA1g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2026 at 12:02 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> 0001+0002: Return whether WaitReadBuffers() needed to wait
>
> The first patch allows pgaio_wref_check_done() to work more reliably with
> io_uring. Until now it only was able to return true if userspace already
> had consumed the kernel's completion event, but returned false otherwise.
> That's not really incorrect, just suboptimal.
>
> The second patch returns whether WaitReadBuffers() needed to wait for
> IO. This is useful for a) instrumentation like in [2] and b) to provide
> information to the read_stream heuristics to control how aggressive to
> perform read ahead.

These both look good to me except in 0001 you left an XXX in
pgaio_wref_check_done() that I think is the very thing that commit
does.

> 0003: read_stream: Issue IO synchronously while in fast path

LGTM

> 0004: read_stream: Prevent distance from decaying too quickly
>
> There are two minor questions here:
> - Should read_stream_pause()/read_stream_resume() restore the "holdoff"
> counter? I doubt it matters for the prospective user, since it will
> only be used when the lookahead distance is very large.

I don't really understand this. We have to do this with distance
because we set it to 0 and use distance == 0 to indicate stream end.
read_stream_pause() doesn't set the distance_decay_holoff to 0. If you
mean, should we reset holdoff to its initial value, then I don't think
so. I imagine that users doing a lot of pause and resume may not have
high distance.

> - For how long to hold off distance reductions? Initially I was torn
> between using "max_pinned_buffers" (Min(max_ios * io_combine_limit,
> cap)) and "max_ios" ([maintenance_]effective_io_concurrency). But I
> think the former makes more sense, as we otherwise won't allow for far
> enough readahead when doing IO combining, and it does seem to make sense
> to hold off decay for long enough that the maximum lookahead could not
> theoretically allow us to start an IO.

I agree. 0004 LGTM otherwise.

- Melanie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-03-31 19:40:14 Re: Initial COPY of Logical Replication is too slow
Previous Message Masahiko Sawada 2026-03-31 19:29:37 Re: Initial COPY of Logical Replication is too slow