Re: Non-reproducible AIO failure

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-12 14:22:22
Message-ID: 3cf6e6ff-2fd3-4eec-b1c5-4cd2de9e75b2@garret.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 12/06/2025 4:57 pm, Andres Freund wrote:
> The problem appears to be in that switch between "when submitted, by the IO
> worker" and "then again by the backend". It's not concurrent access in the
> sense of two processes writing to the same value, it's that when switching
> from the worker updating ->distilled_result to the issuer looking at that, the
> issuer didn't ensure that no outdated version of ->distilled_result could be
> used.
>
> Basically, the problem is that the worker would
>
> 1) set ->distilled_result
> 2) perform a write memory barrier
> 3) set ->state to COMPLETED_SHARED
>
> and then the issuer of the IO would:
>
> 4) check ->state is COMPLETED_SHARED
> 5) use ->distilled_result
>
> The problem is that there currently is no barrier between 4 & 5, which means
> an outdated ->distilled_result could be used.
>
>
> This also explains why the issue looked so weird - eventually, after fprintfs,
> after a core dump, etc, the updated ->distilled_result result would "arrive"
> in the issuing process, and suddenly look correct.

Thank you very much for explanation.
Everything seems to be so simple after explanations, that you can not
even believe that before you think that such behavior can be only caused
by "black magic" or "OS bug":)

Certainly using outdated result can explain such behavior.
But in which particular place we loose read barrier between 4 and 5?
I see `pgaio_io_wait` which as I expect should be called by backend to
wait completion of IO.
And it calls `pgaio_io_was_recycled` to get state and it in turn enforce
read barrier:
```

bool
pgaio_io_was_recycled(PgAioHandle *ioh, uint64 ref_generation,
PgAioHandleState *state)
{
    *state = ioh->state;
    pg_read_barrier();

    return ioh->generation != ref_generation;
}
```

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-06-12 14:25:22 Re: Psql meta-command conninfo+
Previous Message Robert Haas 2025-06-12 14:18:56 Re: pg_dump --with-* options