Re: AIO / read stream heuristics adjustments for index prefetching

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: AIO / read stream heuristics adjustments for index prefetching
Date: 2026-04-03 19:01:13
Message-ID: dyz5hwolszkdbztdag2arphj3esmx2y6ocdfdirryehkgintcj@i7hqar5btt4w
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-04-03 12:45:50 -0400, Melanie Plageman wrote:
> On Thu, Apr 2, 2026 at 9:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > > + /*
> > > + * XXX: Should we actually reduce this at any time other than
> > > + * a reset? For now we have to, as this is also a condition
> > > + * for re-enabling fast_path.
> > > + */
> > > + if (stream->combine_distance > 1)
> > > + stream->combine_distance--;
> > >
> > > I don't think we need to reduce this other than reset.
> >
> > Hm. I go back and forth on that one :)
>
> Separate from the fast-path enablement, we also probably want to
> decrease combine distance when we decrease readahead_distance because
> there is a point where we still want to parallelize the IOs even when
> the distance is lower and to do that, we need to make smaller IOs.

I'm not sure that's something we really need to worry about at this point. If
readahead_distance is so small that it does not allow enough IO concurrency,
we will have to wait for IO completion, which in turn will lead to the
readahead distance being increased again.

I can see some corner cases where this would not suffice, e.g. if you have a
rather low pin limit, but I doubt those are relevant in practice?

> I'm not sure where this point is, but I wonder if a few 256kB IOs is faster
> than 1 1MB IO (could test that with fio actually).

Yes, that point definitely exists. But I think the mechanism for that is to
configure io_combine_limit at or below the threshold at which even bigger IOs
hurt.

> I imagine that there is some size where that is true because of
> peculiarities in how drives (and cloud storage) issue/break up IOs after
> they are a certain size, etc.

It's even true for synchronous copies from the kernel page cache, due to some
hardware issue I have yet to fully understand. On both Intel and AMD CPUs,
unless SMAP is disabled, larger copies from kernel to userspace start to to be
substantially slower, somewhere around 1-4MBs per IO.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-04-03 19:04:48 Re: Introduce XID age based replication slot invalidation
Previous Message Tomas Vondra 2026-04-03 19:01:05 Re: EXPLAIN: showing ReadStream / prefetch stats