Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-05-09 01:06:18
Message-ID: mvpm7ga3dfgz7bvum22hmuz26cariylmcppb3irayftc7bwk3r@l7gb6gr7azhc
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-05-02 20:05:11 -0700, Noah Misch wrote:
> On Wed, Apr 30, 2025 at 04:00:35PM -0400, Andres Freund wrote:
> > pgaio_io_wait_for_free() does what it says on the tin. For that, after a bunch
> > of other things, finds the oldest in-flight IO and waits for it.
> >
> > PgAioHandle *ioh = dclist_head_element(PgAioHandle, node,
> > &pgaio_my_backend->in_flight_ios);
> >
> > switch (ioh->state)
> > {
> > ...
> > case PGAIO_HS_COMPLETED_IO:
> > case PGAIO_HS_SUBMITTED:
> > pgaio_debug_io(DEBUG2, ioh,
> > "waiting for free io with %d in flight",
> > dclist_count(&pgaio_my_backend->in_flight_ios));
> > ...
> > pgaio_io_wait(ioh, ioh->generation);
> > break;
> >
> >
> > The problem is that, if the log level is low enough, ereport() (which is
> > called by pgaio_debug_io()), processes interrupts. The interrupt processing
> > may end up execute ProcessBarrierSmgrRelease(), which in turn needs to wait
> > for all in-flight IOs before the IOs are closed.
> >
> > Which then leads to the
> > elog(PANIC, "waiting for own IO in wrong state: %d",
> > state);
> >
> > error.
>
> Printing state 0 (PGAIO_HS_IDLE), right?

Correct.

>I think the chief problem is that pgaio_io_wait_for_free() is fetching
>ioh->state, then possibly processing interrupts in pgaio_debug_io(), then
>finally fetching ioh->generation. If it fetched ioh->generation to a local
>variable before pgaio_debug_io, I think that would resolve this one.

That's what I also concluded after playing around with a few different
approaches.

> Then the pgaio_io_was_recycled() would prevent the PANIC:
>
> if (pgaio_io_was_recycled(ioh, ref_generation, &state))
> return;
>
> if (am_owner)
> {
> if (state != PGAIO_HS_SUBMITTED
> && state != PGAIO_HS_COMPLETED_IO
> && state != PGAIO_HS_COMPLETED_SHARED
> && state != PGAIO_HS_COMPLETED_LOCAL)
> {
> elog(PANIC, "waiting for own IO in wrong state: %d",
> state);
> }
> }
>
> Is that right?

Yes, it avoids the issue.

> If that's the solution, pgaio_closing_fd() and pgaio_shutdown() would need
> similar care around fetching the generation before the pgaio_debug_io.
> Maybe there's an opportunity for a common inline function. Or at least a
> comment at the "generation" field on how to safely time a fetch thereof and
> any barrier required.

I didn't see a good way to move this into an inline function, unfortunately.

We do need to hold interrupts in a few other places, I think - with some debug
infrastructure (things like calling ProcessBarrierSmgrRelease() whenever
interrupts could be processed and calling CFI() in errstart() in its return
false case) it's possible to find state confusions which trigger
assertions. The issue is that pgaio_io_update_state() contains a
pgaio_debug_io() and executing pgaio_closing_fd() in places that call
pgaio_io_update_state() doesn't end well. There's a similar danger with the
debug message in pgaio_io_reclaim().

In the attached patch I added an assertion to pgaio_io_update_state()
verifying that interrupts are held and added code to hold interrupts in the
relevant places.

> > I'm not yet sure how to best fix it - locally I have done so by pgaio_debug()
> > do a HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around the call to ereport. But
> > that doesn't really seem great - otoh requiring various pieces of code to know
> > that anything emitting debug messages needs to hold interrupts etc makes for
> > rare and hard to understand bugs.
> >
> > We could just make the relevant functions hold interrupts, and that might be
> > the best path forward, but we don't really need to hold all interrupts
> > (e.g. termination would be fine), so it's a bit coarse grained. It would need
> > to happen in a few places, which isn't great either.
> >
> > Other suggestions?
>
> For the "no free IOs despite no in-flight IOs" case, I'd replace the
> ereport(ERROR) with "return;" since we now know interrupt processing reclaimed
> an IO.

Hm - it seems better to me to check if there are now free handles and return
if that's the case, but to keep the error check in case there actually is no
free IO? That seems like a not implausible bug...

> Then decide what protection if any, we need against bugs causing an
> infinite loop in caller pgaio_io_acquire(). What's the case motivating the
> unbounded loop in pgaio_io_acquire(), as opposed to capping at two
> pgaio_io_acquire_nb() calls? If the theory is that pgaio_io_acquire() could
> be reentrant, what scenario would reach that reentrancy?

I do not remember why I wrote this as an endless loop. If you prefer I could
change that as part of this patch.

It does seem rather dangerous that errstart() processes interrupts for debug
messages, but only if the debug message is actually logged. That's really a
recipe for hard to find bugs. I wonder if we should, at least in assertion
mode, process interrupts even if not emitting the message.

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-WIP-aio-Fix-possible-state-confusion-due-to-inter.patch text/x-diff 10.0 KB
v1-0001-WIP-aio-Fix-possible-state-confusion-due-to-inter.patch text/x-diff 10.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-05-09 01:15:04 Re: AIO v2.5
Previous Message Tom Lane 2025-05-09 00:15:24 Re: disabled SSL log_like tests