| From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
| Subject: | Re: AIO / read stream heuristics adjustments for index prefetching |
| Date: | 2026-04-02 12:30:26 |
| Message-ID: | CAN55FZ3GT+o545U9JLR0AC5JtBznqJP4Mf9Mi4k3axqqAXTxPg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Thu, 2 Apr 2026 at 02:20, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> I've pushed what was 0001 and 0002. Will push the former 0003 shortly.
0001 LGTM.
0002: read_stream: Prevent distance from decaying too quickly
+ /*
+ * As we needed IO, prevent distance from being reduced within our
+ * maximum look-ahead window. This avoids having distance collapse too
+ * quickly in workloads where most of the required blocks are cached,
+ * but where the remaining IOs are a sufficient enough factor to cause
+ * a substantial slowdown if executed synchronously.
+ *
+ * There are valid arguments for preventing decay for max_ios or for
+ * max_pinned_buffers. But the argument for max_pinned_buffers seems
+ * clearer - if we can't see any misses within the maximum look-ahead
+ * distance, we can't do any useful read-ahead.
+ */
+ stream->distance_decay_holdoff = stream->max_pinned_buffers;
That is already committed but I have a question. Did you think about
setting stream->distance_decay_holdoff to current stream->distance?
This will also fix 'miss followed by a hit' (it won't fix double
missed followed by a hit, though).
0003: aio: io_uring: Trigger async processing for large IOs
There is a small optimization opportunity, we don't need to calculate
io_size for the DIO since pgaio_uring_should_use_async() will always
return false. Do you think it is worth implementing this? Other than
that, LGTM.
0004: read_stream: Only increase distance when waiting for IO
LGTM.
0005: WIP: read_stream: Move logic about IO combining & issuing to helpers
/* never pin more buffers than allowed */
if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)
return false;
/*
* Don't start more read-ahead if that'd put us over the distance limit
* for doing read-ahead.
*/
if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->distance)
return false;
AFAIK, stream->distance can't be higher than
stream->max_pinned_buffers [1], so I think we can remove the first if
block. If we are not sure about [1], stream->max_pinned_buffers most
likely will be higher than stream->distance, it would make sense to
change the order of these.
Aha, I understood this change after looking at 0006. It is nitpick but
'if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)' block can be added in 0006. Other than
these, LGTM.
0006: WIP: read stream: Split decision about look ahead for AIO and combining
I liked the idea of being more aggressive to do IO combining. What is
the reason for gradually increasing combine_distance, is it to not do
unnecessary IOs at the start?
+ /*
+ * XXX: Should we actually reduce this at any time other than
+ * a reset? For now we have to, as this is also a condition
+ * for re-enabling fast_path.
+ */
+ if (stream->combine_distance > 1)
+ stream->combine_distance--;
I don't think we need to reduce this other than reset.
+ /*
+ * Allow looking further ahead if we have an the process of building a
+ * larger IO, the IO is not yet big enough and we don't yet have IO in
+ * flight. Note that this is allowed even if we are reaching the
+ * read-ahead limit (but not the buffer pin limit).
+ *
+ * This is important for cases where either effective_io_concurrency is
+ * low or we never need to wait for IO and thus are not increasing the
+ * distance. Without this we would end up with lots of small IOs.
+ */
+ if (stream->pending_read_nblocks > 0 &&
+ stream->pinned_buffers == 0 &&
+ stream->pending_read_nblocks < stream->combine_distance)
+ return true;
Do we need to check 'stream->pinned_buffers == 0' here? I think we can
continue to look ahead although there are pinned buffers as we already
know 'stream->pinned_buffers + stream->pending_read_nblocks <
stream->max_pinned_buffers'.
+ /* same if capped not by io_combine_limit but combine_distance */
+ if (stream->combine_distance > 0 &&
+ pending_read_nblocks >= stream->combine_distance)
+ return true;
I think we don't need to check 'stream->combine_distance > 0', it
shouldn't be less or equal than zero when the
'stream->readahead_distance' is not zero, right?
+ {
+ stream->readahead_distance = Min(max_pinned_buffers,
stream->io_combine_limit);
+ stream->combine_distance = stream->io_combine_limit;
+ }
I think the stream->combine_distance should be equal to
stream->readahead_distance as we don't want to surpass
max_pinned_buffers.
0007: Hacky implementation of making read_stream_reset()/end() not wait for IO
Read stream parts LGTM. I don't have a comment on the FIXME part, though.
--
Regards,
Nazir Bilal Yavuz
Microsoft
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Greg Sabino Mullane | 2026-04-02 12:34:23 | Re: [PATCH] Add prepared_orphaned_transaction_timeout GUC |
| Previous Message | Devrim Gündüz | 2026-04-02 12:26:56 | Re: LLVM 22 |