From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Nico Williams <nico(at)cryptonector(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rmt(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-08-26 13:59:54 |
Message-ID: | f9e85423-4d1e-4106-a468-d348dee4610d@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 26/08/2025 3:37 AM, Andres Freund wrote:
> Hi,
>
> I'm a bit confused by this focus on bitfields - both Alexander and Konstantin
> stated they could reproduce the issue without the bitfields.
Sorry if I am not correct, but it seems that the problem was never
reproduced with replaced bitfields.
I have once again looked through all this thread (it is really long;)
and find just one place when you wrote, that you was able to reproduce
the problem without bitfields. But as far as understand it was the
different problem which was fixed by your patch
v1-0001-aio-Add-missing-memory-barrier-when-waiting-for-I.patch
<https://www.postgresql.org/message-id/attachment/177619/v1-0001-aio-Add-missing-memory-barrier-when-waiting-for-I.patch>
Still it is not quite clear to me how bitfields can cause this issue.
Yes, fitfields are updated all together - when you update one field, all
other bitfields are also rewritten.
But actually compiler (at least some versions) does quite
straightforward thing (especially without optimization):
1. Loads all bitfields.
2. Update required bitfield.
3. Stores all bitfields.
It is very unlikely that some stale version is stored in registers -
without -O optimizer is using registers only locally and doesn't try to
keep some variables in registers. Also not clear how we can read some
stale value from memory if we perform write barrier before updating
status (in pgaio_io_update_state).
It it impossible that `op` and `state` fields belongs to different cache
lines because PgAioHandle is aligned on 8. So doesn;t matter whether
there are bitfields or bytes, this fields should be propagated between
cores all together.
Certainly, extraction of bitfields is not an atomic operations (as we
see in assembler, it is done using two loads).
So we can observe inconsistent state of this two fields.
Let's consider fields `op` and `state`. Let's their initial state is
`op=1`m `state=1`.
The we do update `op=2` and call pgaio_io_update_state(2). Some other
task can see (op=1, state=1) or (op=2, state=1) or (op=2, state=2) but
not (op=1, state=2).
The problem can happen only if some other task tries to update `op`
without prior checking for `state`.
In this case we can get (op=3,state=1) as well as (op=2,state=2).
But `op` field is updated in just two places: pgaio_io_reclaim and
pgaio_io_stage. And in both cases we first check 'state' value.
What makes me concern is what happen in case of branch misprediction.
Assume that we have code:
if (state == 1) {
op = 2;
} else {
op = 3;
}
It seems to be absolutely legal, but what will happen in case if with
speculative execution processor start executing first branch and does
assignment op=2, but then state is loaded and used to be not equal to 1,
so processor has to undone this execution. But for some moment we we can
have state (state=0 and op=2) which can be observed by other task and
cause assertion failure.
Is it really possible?
> But we have observed the generated code being pretty grotty and it's caused
> more than enough confusion - so let's just replace them with plain uint8's and
> cast in switches.
+1
May be I am wrong, but it seems to me that after
add-moissing-memory-barrier patch was applied nobody reproduced
assertion failure with replaced bitfields.
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-08-26 14:03:35 | Redundant parameter in the get_useful_pathkeys_for_relation |
Previous Message | Michael Banck | 2025-08-26 13:55:30 | Re: Dead code in ps_status.c |