Re: Non-reproducible AIO failure

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-06 19:37:45
Message-ID: kzasuj5yb7o2fbii43pmxwdvbk5z7j2kbwwigfmdgv4vve2tkl@wvtxubexck2r
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-06-06 15:21:13 -0400, Tom Lane wrote:
> Konstantin Knizhnik <knizhnik(at)garret(dot)ru> writes:
> > There is really essential difference in code generated by clang 15
> > (working) and 16 (not working).
>
> It's a mistake to think that this is a compiler bug. The C standard
> explicitly allows compilers to use word-wide operations to access
> bit-field struct members. Such accesses may fetch or overwrite other
> bit-fields in the same word, using a non-atomic read/modify/write
> sequence. I don't have a recent C standard at hand, but I found this
> quote on the web [1]:
>
> The C Standard, 3.17, paragraph 3 [ISO/IEC 9899:2024], states
>
> Note 2 to entry: A bit-field and an adjacent non-bit-field member
> are in separate memory locations. The same applies to two
> bit-fields, if one is declared inside a nested structure
> declaration and the other is not, or if the two are separated by a
> zero-length bit-field declaration, or if they are separated by a
> non-bit-field member declaration. It is not safe to concurrently
> update two non-atomic bit-fields in the same structure if all
> members declared between them are also (nonzero-length)
> bit-fields, no matter what the sizes of those intervening
> bit-fields happen to be.
>
> So it's our code that is busted. No doubt, what is happening is
> that process A is fetching two fields, modifying one of them,
> and storing the word back (with the observed value of the other
> field) concurrently with some other process trying to update
> the other field. So the other process's update is lost.

There shouldn't be any concurrent accesses here, so I don't really see how the
above would explain the problem (the IO can only ever be modified by one
backend, initially the "owning backend", then, when submitted, by the IO
worker, and then again by the backend).

Alexander's latest post has a good example (due to the added logging)

At the start of reclaim (looks sane):
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 29202

At the end of reclaim (looks sane), note ioh->op = 0
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203

Just a bit later (broken):
!!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 29203

Note that ioh->op went back to 170, despite the code that sets ioh->op not yet
having been called. However the increment of the increment of ioh->generation
survived. Between the second "!!!pgaio_io_reclaim" and "!!!AsyncReadBuffers"
there definitely aren't any other processes accessing things.

> If we want to preserve the compactness of this struct, we have
> to use primitive int types not bit-fields.

I think we have to do that, if for no other reason than how bad the generated
code is. But I would still like to understand what actually causes the problem
here - I'm far from sure it's actually the bitfield.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2025-06-06 20:03:07 Re: CREATE DATABASE command for non-libc providers
Previous Message Tom Lane 2025-06-06 19:21:13 Re: Non-reproducible AIO failure