Re: Non-reproducible AIO failure

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Non-reproducible AIO failure
Date: 2025-06-03 05:00:01
Message-ID: 39f3fee6-e2f5-4613-833a-5131e24d4957@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

02.06.2025 09:00, Alexander Lakhin wrote:
> With additional logging (the patch is attached), I can see the following:
> ...
> !!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
> !!!AsyncReadBuffers [63817] (1)| blocknum: 18, ioh: 0x1046b5660, ioh->op: 0, ioh->state: 1, ioh->result: 0,
> ioh->num_callbacks: 0
> 2025-06-02 00:45:11.391 EDT [63817:14] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1046b5660, ioh->op: 1,
> ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2
>
> TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 63817

I've added logging of the ioh->generation field at the beginning and at
the end of pgaio_io_reclaim() (the complete patch is attached) and got
even more interesting output:
...
!!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 21692
!!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 0, ioh->generation: 21693
!!!AsyncReadBuffers [25175] (1)| blocknum: 40, ioh: 0x104c3e1a0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 21693
!!!pgaio_io_start_readv [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->generation: 21693
!!!pgaio_io_reclaim [25176]| ioh: 0x104c3e080, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 21408
!!!pgaio_io_reclaim [25176]| ioh: 0x104c3e080, ioh->op: 0, ioh->generation: 21409
!!!AsyncReadBuffers [25176] (1)| blocknum: 49, ioh: 0x104c3e080, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 21409
!!!pgaio_io_start_readv [25176]| ioh: 0x104c3e080, ioh->op: 1, ioh->generation: 21409
!!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation: 21693
!!!pgaio_io_reclaim [25175]| ioh: 0x104c3e1a0, ioh->op: 0, ioh->generation: 21694
!!!AsyncReadBuffers [25175] (1)| blocknum: 41, ioh: 0x104c3e1a0, ioh->op: 0, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks: 0, ioh->generation: 21694
2025-06-03 00:19:09.282 EDT [25175:1] LOG: !!!pgaio_io_before_start| ioh: 0x104c3e1a0, ioh->op: 1, ioh->state: 1,
ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 21694
2025-06-03 00:19:09.282 EDT [25175:2] STATEMENT:  explain (analyze, format 'json')
          select count(*) from join_foo
            left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
            on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;

TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 25175

0   postgres                            0x0000000103563550 ExceptionalCondition + 236
1   postgres                            0x000000010326a260 pgaio_io_before_start + 516
2   postgres                            0x0000000103269fb4 pgaio_io_start_readv + 36
3   postgres                            0x0000000103289144 FileStartReadV + 252
4   postgres                            0x00000001032dc424 mdstartreadv + 668
5   postgres                            0x00000001032dfa0c smgrstartreadv + 116
6   postgres                            0x00000001032723b8 AsyncReadBuffers + 2028
7   postgres                            0x0000000103271354 StartReadBuffersImpl + 1196
8   postgres                            0x0000000103270e98 StartReadBuffers + 64
9   postgres                            0x000000010326e4a0 read_stream_start_pending_read + 1204
10  postgres                            0x000000010326db84 read_stream_look_ahead + 812
11  postgres                            0x000000010326d71c read_stream_next_buffer + 2356
...

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Attachment Content-Type Size
aio-debugging-0603.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2025-06-03 05:35:07 Re: Fix slot synchronization with two_phase decoding enabled
Previous Message Michael Paquier 2025-06-03 04:51:06 Re: C11 / VS 2019