Re: Non-reproducible AIO failure

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-10 14:28:11
Message-ID: 8678425d-50d0-4fcd-94e2-b92e711bf8f0@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 09/06/2025 2:05 am, Thomas Munro wrote:
> On Sat, Jun 7, 2025 at 6:47 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2025-06-06 14:03:12 +0300, Konstantin Knizhnik wrote:
>>> There is really essential difference in code generated by clang 15 (working)
>>> and 16 (not working).
>> There also are code gen differences between upstream clang 17 and apple's
>> clang, which is based on llvm 17 as well (I've updated the toolchain, it
>> repros with that as well).
> Just for the record, Apple clang 17 (self-reported clobbered version)
> is said to be based on LLVM 19[1]. For a long time it was off by one
> but tada now it's apparently two. Might be relevant if people are
> comparing generated code up that close....
>
> . o O (I wonder if one could corroborate that by running "strings" on
> upstream clang binaries (as compiled by MacPorts/whatever) for each
> major version and finding new strings, ie strings that don't appear in
> earlier major versions, and then seeing which ones are present in
> Apple's clang binaries... What a silly problem.)
>
> [1] https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_16.x_(since_visionOS_support)

Some updates: I was able to reproduce the problem at my Mac with old
clang (15.0) but only with disabled optimization (CFLAGS=-O0).
So very unlikely it is bug in compiler.

Why it is better reproduced in debug build? May be because of timing.
Or may be because without optimization compiler is doing stupid things:
loads all three bitfields from memory to register (one half word+one
byte), then does some manipulations with this register and writes it
back to memory. Can register somehow be clobbered between read and write
(for example by signal handler)? Very unlikely...
So still do not have any good hypothesis.

But with bitfields replaced with uint8 the bug is not reproduced any more.
May be just do this change (which seems to be good thing in any case)?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2025-06-10 14:30:09 Re: Buffer overflow in SerializeLibraryState() found by Address Sanitizer
Previous Message Pavel Stehule 2025-06-10 14:25:51 Re: proposal: schema variables