Re: Non-reproducible AIO failure

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-15 17:54:54
Message-ID: b92670dd-0a5f-4ea6-9cd1-12f59f5b3bcf@garret.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

With this two additional changes:

diff --git a/src/backend/storage/aio/aio.c b/src/backend/storage/aio/aio.c
index 6c6c0a908e2..6dd2816bea9 100644
--- a/src/backend/storage/aio/aio.c
+++ b/src/backend/storage/aio/aio.c
@@ -538,6 +538,9 @@ pgaio_io_process_completion(PgAioHandle *ioh, int
result)

        pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_SHARED);

+       /* ensure the state update is visible before we broadcast
condition variable */
+       pg_write_barrier();
+
        /* condition variable broadcast ensures state is visible before
wakeup */
        ConditionVariableBroadcast(&ioh->cv);

bool only_running);
diff --git a/src/include/storage/aio_internal.h
b/src/include/storage/aio_internal.h
index 2d37a243abe..0a2bb109696 100644
--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -96,13 +96,13 @@ struct ResourceOwnerData;
 struct PgAioHandle
 {
        /* all state updates should go through pgaio_io_update_state() */
-       PgAioHandleState state:8;
+       uint8           state;

        /* what are we operating on */
-       PgAioTargetID target:8;
+       uint8           target;

        /* which IO operation */
-       PgAioOp         op:8;
+       uint8           op;

        /* bitfield of PgAioHandleFlags */
        uint8           flags;

the problem is not reproduced at my system within 20000 seconds. I will
leave it to run during the night.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseniy Mukhin 2025-06-15 22:25:40 Re: Amcheck verification of GiST and GIN
Previous Message Tom Lane 2025-06-15 16:05:56 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly