| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
| Subject: | Re: Don't synchronously wait for already-in-progress IO in read stream |
| Date: | 2026-01-23 21:03:53 |
| Message-ID: | CAAKRu_atdnPCCy=kfxqWT62Ckaiz3G5t=S97tW24CuL3i3fFfQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, Nov 9, 2025 at 5:21 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> I suppose (or perhaps vaguely recall from an off-list discussion?)
> that you must have considered merging the new
> is-it-already-in-progress check into ReadBuffersCanStartIO(). I
> suppose the nowait argument would become a tri-state argument with a
> value that means "don't wait for an in-progress read, just give me the
> IO handle so I can 'join' it as a foreign waiter", with a new output
> argument to receive the handle, or something along those lines, and I
> guess you'd need a tri-state result, and perhaps s/Can/Try/ in the
> name. That'd remove the double-check (extra header lock-unlock cycle)
> and associated race that can cause that rare synchronous wait (which
> must still happen sometimes in the duelling concurrent scan use
> case?), at the slight extra cost of having to allocate and free a
> handle in the case of repeated blocks (eg the index->heap scan use
> case), but at least that's just backend-local list pushups and doesn't
> do extra work otherwise. Is there some logical problem with that
> approach? Is the code just too clumsy?
Attached v3 basically does what you suggested above. Now, we should
only have to wait if the backend encounters a buffer after another
backend has set BM_IO_IN_PROGRESS but before that other backend has
set the buffer descriptor's wait reference.
0001 and 0002 are Andres' test-related patches. 0003 is a change I
think is required to make one of the tests stable (esp on the BSDs).
0004 is a bit of preliminary refactoring and 0005 is Andres' foreign
IO concept but with your suggested structure and my suggested styling.
I could potentially break out more into smaller refactoring commits,
but I don't think it's too bad the way it is.
A few things about the patch that I'm not sure about:
- I don't know if pgaio_submit_staged() is in all the right places
(and not in too many places). I basically do it before we would wait
when starting read IO on the buffer. In the permanent buffers case,
that's now only when BM_IO_IN_PROGRESS is set but the wait reference
isn't valid yet. This can't happen in the temporary buffers case, so
I'm not sure we need to call pgaio_submit_staged().
- StartBufferIO() is no longer invoked in the AsyncReadBuffers() path.
We could refactor it so that it works for AsyncReadBuffers(), but that
would involve returning something that distinguishes between
IO_IN_PROGRESS and IO already done. And StartBufferIO()'s comment
explicitly says it wants to avoid that.
If we keep my structure, with AsyncReadBuffers() using its own helper
(PrepareNewReadBufferIO()) instead of StartBufferIO(), then it seems
like we need some way to make it clear what StartBufferIO() is for.
I'm not sure what would collectively describe its current users,
though. It also now has no non-test callers passing nowait as true.
However, once we add write combining, it will, so it seems like we
should leave it the way it is to avoid churn. However, other
developers might be confused in the interim.
- In the 004_read_stream tests, I wonder if there is a way to test
that we don't wait for foreign IO until WaitReadBuffers(). We have
tests for the stream accessing the same block, which in some cases
will exercise the foreign IO path. But it doesn't distinguish between
the old behavior -- waiting for the IO to complete when starting read
IO on it -- and the new behavior -- not waiting until
WaitReadBuffers(). That may not be possible to test, though.
- Melanie
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-aio-Refactor-tests-in-preparation-for-more-tests.patch | text/x-patch | 10.7 KB |
| v3-0002-test_aio-Add-read_stream-test-infrastructure-test.patch | text/x-patch | 22.8 KB |
| v3-0003-fix-test.patch | text/x-patch | 1.2 KB |
| v3-0004-Make-buffer-hit-helper.patch | text/x-patch | 5.8 KB |
| v3-0005-Don-t-wait-for-already-in-progress-IO.patch | text/x-patch | 20.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dharin Shah | 2026-01-23 21:08:58 | Re: [PATCH] tests: verify renamed index functionality in alter_table |
| Previous Message | Sami Imseih | 2026-01-23 20:31:26 | Re: Optional skipping of unchanged relations during ANALYZE? |