Re: Non-reproducible AIO failure

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-06 03:00:00
Message-ID: f50fe8a7-4e6a-4a72-9347-36c1960b91b2@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

05.06.2025 22:00, Alexander Lakhin wrote:
> Thank you for your attention to this and for the tip! Today I tried the
> following:
> --- a/src/include/storage/aio.h
> +++ b/src/include/storage/aio.h
> @@ -89,8 +89,8 @@ typedef enum PgAioOp
>         /* intentionally the zero value, to help catch zeroed memory etc */
>         PGAIO_OP_INVALID = 0,
>
> -       PGAIO_OP_READV,
> -       PGAIO_OP_WRITEV,
> +       PGAIO_OP_READV = 0xaa,
> +       PGAIO_OP_WRITEV = 0xbb,

And one more bit of information from me, if you're not tired of it yet:
I've added:
@@ -2238,6 +2240,7 @@ FileStartReadV(PgAioHandle *ioh, File file,

        vfdP = &VfdCache[file];

+if (ioh->op != PGAIO_OP_INVALID) elog(LOG, "!!!FileStartReadV| ioh: %p, ioh->op: %d, ioh->state: %d, ioh->result: %d,
ioh->num_callbacks: %d, ioh->generation: %llu", ioh, ((PgAioHandle *)ioh)->op, ((PgAioHandle *)ioh)->state,
((PgAioHandle *)ioh)->result, ((PgAioHandle *)ioh)->num_callbacks, ((PgAioHandle *)ioh)->generation);
        pgaio_io_start_readv(ioh, vfdP->fd, iovcnt, offset);

and got:
...
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29200
!!!AsyncReadBuffers [91056] (1)| blocknum: 70, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29200
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29201
!!!AsyncReadBuffers [91056] (1)| blocknum: 71, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29201
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29202
!!!AsyncReadBuffers [91056] (1)| blocknum: 72, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29202
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203
!!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:14] pg_regress/brin LOG: !!!FileStartReadV| ioh: 0x1066d15d0, ioh->op: 170,
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:15] pg_regress/brin CONTEXT:  SQL statement "SELECT array_agg(ctid) FROM brintest
WHERE cidrcol = '10.2.14/24'::inet "
        PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:16] pg_regress/brin STATEMENT: DO $x$
...
2025-06-05 19:52:44.174 EDT [91056:17] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1066d15d0, ioh->op: 170,
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:18] pg_regress/brin CONTEXT:  SQL statement "SELECT array_agg(ctid) FROM brintest
WHERE cidrcol = '10.2.14/24'::inet "
        PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:19] pg_regress/brin STATEMENT: DO $x$
        DECLARE
                r record;
                r2 record;
                cond text;
                idx_ctids tid[];
                ss_ctids tid[];
                count int;
                plan_ok bool;
                plan_line text;
        BEGIN
                FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op)
WITH ORDINALITY AS oper LOOP
...
TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 91056
0   postgres                            0x0000000104ff771c ExceptionalCondition + 236
1   postgres                            0x0000000104cfe32c pgaio_io_before_start + 516
2   postgres                            0x0000000104cfe080 pgaio_io_start_readv + 36
3   postgres                            0x0000000104d1d310 FileStartReadV + 508
4   postgres                            0x0000000104d705f0 mdstartreadv + 668
5   postgres                            0x0000000104d73bd8 smgrstartreadv + 116
6   postgres                            0x0000000104d06484 AsyncReadBuffers + 2028
7   postgres                            0x0000000104d05420 StartReadBuffersImpl + 1196
8   postgres                            0x0000000104d04f64 StartReadBuffers + 64
9   postgres                            0x0000000104d0256c read_stream_start_pending_read + 1204
10  postgres                            0x0000000104d01c50 read_stream_look_ahead + 812
11  postgres                            0x0000000104d017e8 read_stream_next_buffer + 2356
12  postgres                            0x00000001046f661c heap_fetch_next_buffer + 284
13  postgres                            0x00000001046e73b8 heapgettup_pagemode + 192
14  postgres                            0x00000001046e7988 heap_getnextslot + 84

Thus, this time ioh->op switched to PGAIO_OP_READV earlier, even before
the logging in AsyncReadBuffers(). By the way, this
"DO .. FOR ... FROM brinopers, unnest(op) ... " triggers the assertion not
for the first time for me (I've counted 3 such failures in my notes).

Best regards,
Alexander Lakhin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2025-06-06 03:41:02 Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX
Previous Message Robert Treat 2025-06-06 02:32:39 Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX