| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
| Subject: | Re: Don't synchronously wait for already-in-progress IO in read stream |
| Date: | 2026-03-18 16:59:11 |
| Message-ID: | CAAKRu_bCZfWpmfmfUiQjDgFK12T8FUox-hV9YFDy6XHTh=i0ew@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thanks for the review!
Everything you suggested that I don't elaborate on below, I've just
gone ahead and done in attached v6.
On Tue, Mar 17, 2026 at 1:26 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -1990,7 +1990,7 @@ AsyncReadBuffers(ReadBuffersOperation *operation, int *nblocks_progress)
> > * must have started out as a miss in PinBufferForBlock(). The other
> > * backend will track this as a 'read'.
> > */
> > - TRACE_POSTGRESQL_BUFFER_READ_DONE(forknum, blocknum + operation->nblocks_done,
> > + TRACE_POSTGRESQL_BUFFER_READ_DONE(forknum, blocknum + operation->nblocks_done - 1,
> > operation->smgr->smgr_rlocator.locator.spcOid,
> > operation->smgr->smgr_rlocator.locator.dbOid,
> > operation->smgr->smgr_rlocator.locator.relNumber,
> > --
>
> Ah, the issue is that we already incremented nblocks_done, right? Maybe it'd
> be easier to understand if we stashed blocknum + nblocks_done into a local
> var, and use it in in both branches of if (!ReadBuffersCanStartIO())?
>
> This probably needs to be backpatched...
0003 in v6 does as you suggest. I'll backport it after a quick +1 here.
> > Subject: [PATCH v5 4/5] Make buffer hit helper
>
> > @@ -1236,17 +1238,6 @@ PinBufferForBlock(Relation rel,
> > persistence == RELPERSISTENCE_PERMANENT ||
> > persistence == RELPERSISTENCE_UNLOGGED));
> >
> > - if (persistence == RELPERSISTENCE_TEMP)
> > - {
> > - io_context = IOCONTEXT_NORMAL;
> > - io_object = IOOBJECT_TEMP_RELATION;
> > - }
> > - else
> > - {
> > - io_context = IOContextForStrategy(strategy);
> > - io_object = IOOBJECT_RELATION;
> > - }
> > -
>
> I'm mildly worried that this will lead to a bit worse code generation, the
> compiler might have a harder time figuring out that io_context/io_object
> doesn't change across multiple PinBufferForBlock calls. Although it already
> might be unable to do so (we don't mark IOContextForStrategy as
> pure [1]).
>
> I kinda wonder if, for StartReadBuffersImpl(), we should go the opposite
> direction, and explicitly look up IOContextForStrategy(strategy) *before* the
> actual_nblocks loop to make sure the compiler doesn't inject external function
> calls (which will in all likelihood require register spilling etc).
I added a separate patch to refactor the code to do this first (0004).
> > @@ -1254,18 +1245,11 @@ PinBufferForBlock(Relation rel,
> > smgr->smgr_rlocator.backend);
> >
> > if (persistence == RELPERSISTENCE_TEMP)
>
> And here it might end up adding a separate persistence == RELPERSISTENCE_TEMP
> branch in CountBufferHit(), I suspect the compiler may not be able to optimize
> it away.
And you think it is optimizing it away in PinBufferForBlock()?
> At the very least I'd invert the call to CountBufferHit() and the
> pgstat_count_buffer_read(), as the latter will probably prevent most
> optimizations (due to the compiler not being able to prove that
> (rel)->pgstat_info->counts.blocks_fetched is a different memory location as
> *foundPtr).
I did this. I did not check the compiled code before or after though.
> > +CountBufferHit(BufferAccessStrategy strategy,
> > + Relation rel, char persistence, SMgrRelation smgr,
> > + ForkNumber forknum, BlockNumber blocknum)
>
> I don't think "Count*" is a great name for something that does tracepoints and
> vacuum cost balance accounting, the latter actually changes behavior of the
> program due to the sleeps it injects.
>
> The first alternative I have is AccountForBufferHit(), not great, but still
> seems a bit better.
At some point, I had ProcessBufferHit(), but Bilal felt it implied the
function did more than counting. I've changed it now to
TrackBufferHit().
> > From 4d737fa14f333abc4ee6ade8cb0340530695e887 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Fri, 23 Jan 2026 14:00:31 -0500
> > Subject: [PATCH v5 5/5] Don't wait for already in-progress IO
>
> I wonder if might be worth splitting this up in a refactoring and a
> "behavioural change" commit. Might be too complicated.
>
> Candidates for a split seem to be:
> - Moving pgaio_io_acquire_nb() to earlier
> - Introduce PrepareNewReadBufferIO/PrepareAdditionalReadBuffer without support
> for READ_BUFFER_IN_PROGRESS
> - introduce READ_BUFFER_IN_PROGRESS
I've done something like this in v6.
> > + * Local version of PrepareNewReadBufferIO(). Here instead of localbuf.c to
> > + * avoid an external function call.
> > + */
> > +static PrepareReadBuffer_Status
> > +PrepareNewLocalReadBufferIO(ReadBuffersOperation *operation,
> > + Buffer buffer)
>
> Hm, seems the test in 0002 should be extended to cover the the temp table case.
I did this. However, I was a bit lazy in how many cases I added
because I used invalidate_rel_block(), which is pretty verbose (since
evict_rel() doesn't work yet for local buffers).
I don't think we'll be able to easily test READ_BUFFER_ALREADY_DONE
(though perhaps we aren't testing it for shared buffers either?).
> > +{
> > + BufferDesc *desc = GetLocalBufferDescriptor(-buffer - 1);
> > + uint64 buf_state = pg_atomic_read_u64(&desc->state);
> > +
> > + /* Already valid, no work to do */
> > + if (buf_state & BM_VALID)
> > + {
> > + pgaio_wref_clear(&operation->io_wref);
> > + return READ_BUFFER_ALREADY_DONE;
> > + }
>
> Is this reachable for local buffers?
Yes, I think this is reachable by local buffers that started the IO
already and then completed it when acquiring a new IO handle at the
top of AsyncReadBuffers().
> > + if (pgaio_wref_valid(&desc->io_wref))
> > + {
> > + operation->io_wref = desc->io_wref;
> > + operation->foreign_io = true;
> > + return READ_BUFFER_IN_PROGRESS;
> > + }
> > +
> > + /*
> > + * While it is possible for a buffer to have been prepared for IO but not
> > + * yet had its wait reference set, there's no way for us to know that for
> > + * temporary buffers. Thus, we'll prepare for own IO on this buffer.
> > + */
> > + return READ_BUFFER_READY_FOR_IO;
>
> Is that actually possible? And would it be ok to just do start IO in that
> case?
You're right, that's not possible for local buffers. For local
buffers, we "prepare for IO" by calling PrepareNewLocalReadBufferIO()
and then set the wait ref in a codepath initiated by calling
smgrstartreadv() as part of "staging" the IO. No one can observe that
buffer in between the call to PrepareNewLocalReadBufferIO() and
setting the wait reference. So, I've deleted the comment.
> > +static PrepareReadBuffer_Status
> > +PrepareNewReadBufferIO(ReadBuffersOperation *operation,
> > + Buffer buffer)
> > +{
>
> I'm not sure I love "New" here, compared to "Additional". Perhaps "Begin" &
> "Continue"? Or "First" & "Additional"? Or ...
I changed the names to PrepareHeadBufferReadIO() and
PrepareAdditionalBufferReadIO(). "Head" instead of "First" because
First felt like it implied the first buffer ever and head seems to
make it clear it is the first buffer of this new IO.
- Melanie
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-aio-Refactor-tests-in-preparation-for-more-tests.patch | text/x-patch | 10.8 KB |
| v6-0002-test_aio-Add-read_stream-test-infrastructure-test.patch | text/x-patch | 24.1 KB |
| v6-0003-Fix-off-by-one-error-in-read-IO-tracing.patch | text/x-patch | 2.4 KB |
| v6-0004-Pass-io_object-and-io_context-through-to-PinBuffe.patch | text/x-patch | 3.4 KB |
| v6-0005-Make-buffer-hit-helper.patch | text/x-patch | 5.0 KB |
| v6-0006-Restructure-AsyncReadBuffers.patch | text/x-patch | 9.9 KB |
| v6-0007-Introduce-PrepareHeadBufferReadIO-and-PrepareAddi.patch | text/x-patch | 8.1 KB |
| v6-0008-AIO-Don-t-wait-for-already-in-progress-IO.patch | text/x-patch | 13.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-03-18 17:03:03 | Re: Enable -Wstrict-prototypes and -Wold-style-definition by default |
| Previous Message | Artur Zakirov | 2026-03-18 16:55:25 | Re: Order of InvokeObjectPostAlterHook within ATExecSetNotNull |