From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-06-12 05:03:22 |
Message-ID: | 1fea555c-0345-46dc-8da5-5e667cad436a@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I tried to catch moment when memory is changed using mprotect.
I have aligned PgAioHandle on page boundary (16kb at MacOS), and disable
writes in `pgaio_io_reclaim`:
```
static void
pgaio_io_reclaim(PgAioHandle *ioh)
{
RESUME_INTERRUPTS();
rc = mprotect(ioh, sizeof(*ioh), PROT_READ);
Assert(rc == 0);
fprintf(stderr, "!!!pgaio_io_reclaim [%d]| ioh: %p, ioh->op: %d,
ioh->generation: %llu\n", getpid(), ioh, ioh->op, ioh->generation);
}
```
and reenable writes in `pgaio_io_before_start` and `pgaio_io_acquire_nb`:
```
static void
pgaio_io_before_start(PgAioHandle *ioh)
{
int rc = mprotect(ioh, sizeof(*ioh), PROT_READ|PROT_WRITE);
Assert(rc == 0);
```
and
```
PgAioHandle *
pgaio_io_acquire_nb(struct ResourceOwnerData *resowner, PgAioReturn *ret)
{
...
ioh = dclist_container(PgAioHandle, node, ion);
Assert(ioh->state == PGAIO_HS_IDLE);
Assert(ioh->owner_procno == MyProcNumber);
rc = mprotect(ioh, sizeof(*ioh), PROT_READ|PROT_WRITE);
Assert(rc == 0);
}
```
The error is reproduced after 133 iterations:
```
!!!pgaio_io_reclaim [20376]| ioh: 0x1019bc000, ioh->op: 0,
ioh->generation: 19346
!!!AsyncReadBuffers [20376] (1)| blocknum: 21, ioh: 0x1019bc000,
ioh->op: 1, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0,
ioh->generation: 19346
2025-06-12 01:05:31.865 EEST [20376:918] pg_regress/psql LOG:
!!!pgaio_io_before_start| ioh: 0x1019bc000, ioh->op: 1, ioh->state: 1,
ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 19346
```
But no write protection violation happen.
Do not know how to interpret this fact. Changes are made by kernel?
`pgaio_io_acquire_nb` was called between `pgaio_io_reclaim` and
`pgaio_io_before_start`?
I am now going add trace to `pgaio_io_acquire_nb`.
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-06-12 05:14:30 | Re: Replication slot is not able to sync up |
Previous Message | shveta malik | 2025-06-12 04:49:57 | Re: Fix slot synchronization with two_phase decoding enabled |