From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nico Williams <nico(at)cryptonector(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rmt(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-08-24 12:38:16 |
Message-ID: | CA+hUKGK6ujMT5myrEkgQ+n-N3rquZA4haHfJszQVe4ofHd6z6A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Aug 24, 2025 at 5:32 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 20/08/2025 9:00 PM, Alexander Lakhin wrote:
> > for i in {1..10}; do np=$((20 + $RANDOM % 10)); echo "iteration $i:
> > $np"; time parallel -j40 --linebuffer --tag /tmp/repro-AIO-Assert.sh
> > {} ::: `seq $np` || break; sleep $(($RANDOM % 20)); done; echo -e "\007"
>
>
> Unfortunately I was not able to reproduce the problem on my MacBook M4
> Pro (which should be identical with yours) using your instructions: I
> tried five times but in all cases 10 iterations are completed normally.
I can't reproduce it either. I think the sequence is:
IO worker: pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_SHARED)
Backend: pgaio_io_wait(ioh, generation)
pgaio_io_was_recycled(ioh, generation, &state)
case for state == PGAIO_HS_COMPLETED_SHARED
pgaio_io_reclaim(boh)
pgaio_io_update_state(ioh, PGAIO_HS_IDLE)
ioh->op = PGAIO_OP_INVALID;
Backend: pgaio_io_acquire_nb()
Assert(ioh->state == PGAIO_HS_IDLE) *fails*
pgaio_io_update_state() is essentially:
pg_write_barrier();
ioh->state = new_state;
I looked at the code generated by a non-optimised build:
(lldb) dis --name pgaio_io_update_state
...
postgres[0x100687fc4] <+372>: dmb ish XXX memory barrier
postgres[0x100687fc8] <+376>: ldur w10, [x29, #-0xc]
postgres[0x100687fcc] <+380>: ldur x9, [x29, #-0x8]
postgres[0x100687fd0] <+384>: ldrh w8, [x9] XXX load
state + target
postgres[0x100687fd4] <+388>: ldrb w11, [x9, #0x2] XXX load op
postgres[0x100687fd8] <+392>: orr w8, w8, w11, lsl #16
postgres[0x100687fdc] <+396>: and w10, w10, #0xff
postgres[0x100687fe0] <+400>: and w8, w8, #0xffffff00
postgres[0x100687fe4] <+404>: orr w10, w8, w10
postgres[0x100687fe8] <+408>: lsr w8, w10, #16
postgres[0x100687fec] <+412>: strh w10, [x9] XXX store
state + target
postgres[0x100687ff0] <+416>: strb w8, [x9, #0x2] XXX store op
postgres[0x100687ff4] <+420>: ldp x29, x30, [sp, #0x60]
postgres[0x100687ff8] <+424>: add sp, sp, #0x70
postgres[0x100687ffc] <+428>: ret
Even the -O3 version does that, it just uses fancier instructions:
postgres[0x100687c08] <+316>: dmb ish
postgres[0x100687c0c] <+320>: ldurh w8, [x20, #0x1]
postgres[0x100687c10] <+324>: bfi w19, w8, #8, #16
postgres[0x100687c14] <+328>: lsr w8, w8, #8
postgres[0x100687c18] <+332>: strb w8, [x20, #0x2] XXX clobber op
postgres[0x100687c1c] <+336>: strh w19, [x20]
Could it be that the store buffer was flushed between the two stores,
pgaio_io_was_recycled() saw the new state, pgaio_io_reclaim() assigned
ioh->op = PGAIO_OP_INVALID and it was flushed to L1, and then finally
the time travelling op value was flushed and clobbered it?
Apple clang 17 output (= LLVM 19) with optimisation is smart enough to
use a single store:
postgres[0x1002f0ae8] <+308>: dmb ish
postgres[0x1002f0aec] <+312>: strb w19, [x20]
That's also how open source clang 17 compiles it if you rip out the
bitfield. I guess if you do that, you won't be able to reproduce
this, Alexander? Something like:
struct PgAioHandle
{
/* all state updates should go through pgaio_io_update_state() */
- PgAioHandleState state:8;
+ uint8 state;
/* what are we operating on */
- PgAioTargetID target:8;
+ uint8 target;
/* which IO operation */
- PgAioOp op:8;
+ uint8 op;
Curiously, Apple clang 17 without optimisation does some amount of
unnecessary clobbering, but not the whole bitset and it's only a
single 16 bit store.
postgres[0x10057bfb0] <+348>: dmb ish
postgres[0x10057bfb4] <+352>: ldur w10, [x29, #-0xc]
postgres[0x10057bfb8] <+356>: ldur x9, [x29, #-0x8]
postgres[0x10057bfbc] <+360>: ldrh w8, [x9]
postgres[0x10057bfc0] <+364>: and w10, w10, #0xff
postgres[0x10057bfc4] <+368>: and w8, w8, #0xffffff00
postgres[0x10057bfc8] <+372>: orr w8, w8, w10
postgres[0x10057bfcc] <+376>: strh w8, [x9]
From | Date | Subject | |
---|---|---|---|
Next Message | Marc-Olaf Jaschke | 2025-08-24 13:03:42 | Re: Improve hash join's handling of tuples with null join keys |
Previous Message | Srinath Reddy Sadipiralla | 2025-08-24 12:28:44 | Re: Potential problem in commit f777d773878 and 4f7f7b03758 |