From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-06-06 03:00:00 |
Message-ID: | f50fe8a7-4e6a-4a72-9347-36c1960b91b2@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
05.06.2025 22:00, Alexander Lakhin wrote:
> Thank you for your attention to this and for the tip! Today I tried the
> following:
> --- a/src/include/storage/aio.h
> +++ b/src/include/storage/aio.h
> @@ -89,8 +89,8 @@ typedef enum PgAioOp
> /* intentionally the zero value, to help catch zeroed memory etc */
> PGAIO_OP_INVALID = 0,
>
> - PGAIO_OP_READV,
> - PGAIO_OP_WRITEV,
> + PGAIO_OP_READV = 0xaa,
> + PGAIO_OP_WRITEV = 0xbb,
And one more bit of information from me, if you're not tired of it yet:
I've added:
@@ -2238,6 +2240,7 @@ FileStartReadV(PgAioHandle *ioh, File file,
vfdP = &VfdCache[file];
+if (ioh->op != PGAIO_OP_INVALID) elog(LOG, "!!!FileStartReadV| ioh: %p, ioh->op: %d, ioh->state: %d, ioh->result: %d,
ioh->num_callbacks: %d, ioh->generation: %llu", ioh, ((PgAioHandle *)ioh)->op, ((PgAioHandle *)ioh)->state,
((PgAioHandle *)ioh)->result, ((PgAioHandle *)ioh)->num_callbacks, ((PgAioHandle *)ioh)->generation);
pgaio_io_start_readv(ioh, vfdP->fd, iovcnt, offset);
and got:
...
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29200
!!!AsyncReadBuffers [91056] (1)| blocknum: 70, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29200
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29201
!!!AsyncReadBuffers [91056] (1)| blocknum: 71, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29201
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29202
!!!AsyncReadBuffers [91056] (1)| blocknum: 72, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29202
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203
!!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:14] pg_regress/brin LOG: !!!FileStartReadV| ioh: 0x1066d15d0, ioh->op: 170,
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:15] pg_regress/brin CONTEXT: SQL statement "SELECT array_agg(ctid) FROM brintest
WHERE cidrcol = '10.2.14/24'::inet "
PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:16] pg_regress/brin STATEMENT: DO $x$
...
2025-06-05 19:52:44.174 EDT [91056:17] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1066d15d0, ioh->op: 170,
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:18] pg_regress/brin CONTEXT: SQL statement "SELECT array_agg(ctid) FROM brintest
WHERE cidrcol = '10.2.14/24'::inet "
PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:19] pg_regress/brin STATEMENT: DO $x$
DECLARE
r record;
r2 record;
cond text;
idx_ctids tid[];
ss_ctids tid[];
count int;
plan_ok bool;
plan_line text;
BEGIN
FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op)
WITH ORDINALITY AS oper LOOP
...
TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 91056
0 postgres 0x0000000104ff771c ExceptionalCondition + 236
1 postgres 0x0000000104cfe32c pgaio_io_before_start + 516
2 postgres 0x0000000104cfe080 pgaio_io_start_readv + 36
3 postgres 0x0000000104d1d310 FileStartReadV + 508
4 postgres 0x0000000104d705f0 mdstartreadv + 668
5 postgres 0x0000000104d73bd8 smgrstartreadv + 116
6 postgres 0x0000000104d06484 AsyncReadBuffers + 2028
7 postgres 0x0000000104d05420 StartReadBuffersImpl + 1196
8 postgres 0x0000000104d04f64 StartReadBuffers + 64
9 postgres 0x0000000104d0256c read_stream_start_pending_read + 1204
10 postgres 0x0000000104d01c50 read_stream_look_ahead + 812
11 postgres 0x0000000104d017e8 read_stream_next_buffer + 2356
12 postgres 0x00000001046f661c heap_fetch_next_buffer + 284
13 postgres 0x00000001046e73b8 heapgettup_pagemode + 192
14 postgres 0x00000001046e7988 heap_getnextslot + 84
Thus, this time ioh->op switched to PGAIO_OP_READV earlier, even before
the logging in AsyncReadBuffers(). By the way, this
"DO .. FOR ... FROM brinopers, unnest(op) ... " triggers the assertion not
for the first time for me (I've counted 3 such failures in my notes).
Best regards,
Alexander Lakhin
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2025-06-06 03:41:02 | Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX |
Previous Message | Robert Treat | 2025-06-06 02:32:39 | Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX |