From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at> |
Subject: | Re: AIO v2.5 |
Date: | 2025-05-03 03:05:11 |
Message-ID: | 20250503030511.9b.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 30, 2025 at 04:00:35PM -0400, Andres Freund wrote:
> pgaio_io_wait_for_free() does what it says on the tin. For that, after a bunch
> of other things, finds the oldest in-flight IO and waits for it.
>
> PgAioHandle *ioh = dclist_head_element(PgAioHandle, node,
> &pgaio_my_backend->in_flight_ios);
>
> switch (ioh->state)
> {
> ...
> case PGAIO_HS_COMPLETED_IO:
> case PGAIO_HS_SUBMITTED:
> pgaio_debug_io(DEBUG2, ioh,
> "waiting for free io with %d in flight",
> dclist_count(&pgaio_my_backend->in_flight_ios));
> ...
> pgaio_io_wait(ioh, ioh->generation);
> break;
>
>
> The problem is that, if the log level is low enough, ereport() (which is
> called by pgaio_debug_io()), processes interrupts. The interrupt processing
> may end up execute ProcessBarrierSmgrRelease(), which in turn needs to wait
> for all in-flight IOs before the IOs are closed.
>
> Which then leads to the
> elog(PANIC, "waiting for own IO in wrong state: %d",
> state);
>
> error.
Printing state 0 (PGAIO_HS_IDLE), right? I think the chief problem is that
pgaio_io_wait_for_free() is fetching ioh->state, then possibly processing
interrupts in pgaio_debug_io(), then finally fetching ioh->generation. If it
fetched ioh->generation to a local variable before pgaio_debug_io, I think
that would resolve this one. Then the pgaio_io_was_recycled() would prevent
the PANIC:
if (pgaio_io_was_recycled(ioh, ref_generation, &state))
return;
if (am_owner)
{
if (state != PGAIO_HS_SUBMITTED
&& state != PGAIO_HS_COMPLETED_IO
&& state != PGAIO_HS_COMPLETED_SHARED
&& state != PGAIO_HS_COMPLETED_LOCAL)
{
elog(PANIC, "waiting for own IO in wrong state: %d",
state);
}
}
Is that right? If that's the solution, pgaio_closing_fd() and
pgaio_shutdown() would need similar care around fetching the generation before
the pgaio_debug_io. Maybe there's an opportunity for a common inline
function. Or at least a comment at the "generation" field on how to safely
time a fetch thereof and any barrier required.
> A similar set of steps can lead to the "no free IOs despite no in-flight IOs"
> ERROR that Alexander also observed - if pgaio_submit_staged() triggers a debug
> ereport that executes ProcessBarrierSmgrRelease() in an interrupt, we might
> wait for all in-flight IOs during IO submission, triggering the error.
That makes sense.
> I'm not yet sure how to best fix it - locally I have done so by pgaio_debug()
> do a HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around the call to ereport. But
> that doesn't really seem great - otoh requiring various pieces of code to know
> that anything emitting debug messages needs to hold interrupts etc makes for
> rare and hard to understand bugs.
>
> We could just make the relevant functions hold interrupts, and that might be
> the best path forward, but we don't really need to hold all interrupts
> (e.g. termination would be fine), so it's a bit coarse grained. It would need
> to happen in a few places, which isn't great either.
>
> Other suggestions?
For the "no free IOs despite no in-flight IOs" case, I'd replace the
ereport(ERROR) with "return;" since we now know interrupt processing reclaimed
an IO. Then decide what protection if any, we need against bugs causing an
infinite loop in caller pgaio_io_acquire(). What's the case motivating the
unbounded loop in pgaio_io_acquire(), as opposed to capping at two
pgaio_io_acquire_nb() calls? If the theory is that pgaio_io_acquire() could
be reentrant, what scenario would reach that reentrancy?
> Thanks again for finding and reporting this Alexander!
+1!
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-05-03 04:38:32 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Sutou Kouhei | 2025-05-03 02:24:06 | Re: Make COPY format extendable: Extract COPY TO format implementations |