From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-06-15 11:48:43 |
Message-ID: | 174af978-1406-4342-8990-87f2f0804107@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 13/06/2025 11:20 pm, Andres Freund wrote:
> Attached is a patch that fixes the problem for me. Alexander, Konstantin,
> could you verify that it also fixes the problem for you?
>
> Given that it does address the problem for me, I'm inclined to push this
> fairly soon, the barrier is pretty obviously required.
>
Unfortunately I still able to reproduce assertion failure after 17450
seconds (`--enable-debug --enable-cassert CFLAGS=-O0`):
* frame #0: 0x0000000187248704 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018727fc28 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018718dae8 libsystem_c.dylib`abort + 180
frame #3: 0x00000001011a0fbc
postgres`ExceptionalCondition(conditionName="ioh->op ==
PGAIO_OP_INVALID", fileName="aio_io.c", lineNumber=161) at assert.c:66:2
frame #4: 0x0000000100ea9b40
postgres`pgaio_io_before_start(ioh=0x000000010a6e22f0) at aio_io.c:161:2
frame #5: 0x0000000100ea9a00
postgres`pgaio_io_start_readv(ioh=0x000000010a6e22f0, fd=12, iovcnt=1,
offset=85565440) at aio_io.c:81:2
frame #6: 0x0000000100ec8798
postgres`FileStartReadV(ioh=0x000000010a6e22f0, file=4, iovcnt=1,
offset=85565440, wait_event_info=167772181) at fd.c:2241:2
frame #7: 0x0000000100f1b7d0
postgres`mdstartreadv(ioh=0x000000010a6e22f0, reln=0x000000010b0289c8,
forknum=MAIN_FORKNUM, blocknum=10445, buffers=0x000000016f5ed998,
nblocks=1) at md.c:1019:8
frame #8: 0x0000000100f1ed88
postgres`smgrstartreadv(ioh=0x000000010a6e22f0, reln=0x000000010b0289c8,
forknum=MAIN_FORKNUM, blocknum=10445, buffers=0x000000016f5ed998,
nblocks=1) at smgr.c:758:2
frame #9: 0x0000000100eb1a48
postgres`AsyncReadBuffers(operation=0x000000011d80d108,
nblocks_progress=0x000000016f5edeb4) at bufmgr.c:1959:3
frame #10: 0x0000000100eb0c24
postgres`StartReadBuffersImpl(operation=0x000000011d80d108,
buffers=0x000000011d80cdf4, blockNum=10445, nblocks=0x000000016f5edeb4,
flags=0, allow_forwarding=true) at bufmgr.c:1428:18
I wonder if we also need second write barrier in pgaio_io_update_state:
/*
* Ensure the changes signified by the new state are visible before the
* new state becomes visible.
*/
pg_write_barrier();
ioh->state = new_state;
/*
* Make sure that status is written
*/
pg_write_barrier();
Otherwise is is not clear when status is written.
Also I think that replacing bitfields with `uint8` and may be even with
`int`, is good idea at least to avoids false sharing.
From | Date | Subject | |
---|---|---|---|
Next Message | Ryo Kanbayashi | 2025-06-15 12:02:31 | Re: [PATCH] PGSERVICEFILE as part of a normal connection string |
Previous Message | Alexander Korotkov | 2025-06-15 11:02:44 | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |