From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
Cc: | Nico Williams <nico(at)cryptonector(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rmt(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-08-27 23:08:20 |
Message-ID: | orcdxxb3df5pvs6tfnh5bx5h4oiazb7txc54vlwoeidm27ashu@yho5nppko52o |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-08-26 16:59:54 +0300, Konstantin Knizhnik wrote:
> On 26/08/2025 3:37 AM, Andres Freund wrote:
> > Hi,
> >
> > I'm a bit confused by this focus on bitfields - both Alexander and Konstantin
> > stated they could reproduce the issue without the bitfields.
>
>
> Sorry if I am not correct, but it seems that the problem was never
> reproduced with replaced bitfields.
I think unfortunately you and Thomas that are correct, and me that somehow got
confused. I went through the entire thread and I'm not sure how I got the
impression that the problem could be reproduced without use of the
bitfields. Sorry for that!
I'll push the patch to remove the bitfields after adjusting the commit message
somewhat.
> Still it is not quite clear to me how bitfields can cause this issue.
Same.
> Yes, fitfields are updated all together - when you update one field, all
> other bitfields are also rewritten.
> But actually compiler (at least some versions) does quite straightforward
> thing (especially without optimization):
> 1. Loads all bitfields.
> 2. Update required bitfield.
> 3. Stores all bitfields.
>
> It is very unlikely that some stale version is stored in registers - without
> -O optimizer is using registers only locally and doesn't try to keep some
> variables in registers. Also not clear how we can read some stale value from
> memory if we perform write barrier before updating status (in
> pgaio_io_update_state).
> It it impossible that `op` and `state` fields belongs to different cache
> lines because PgAioHandle is aligned on 8. So doesn;t matter whether there
> are bitfields or bytes, this fields should be propagated between cores all
> together.
> Certainly, extraction of bitfields is not an atomic operations (as we see in
> assembler, it is done using two loads).
> So we can observe inconsistent state of this two fields.
>
> Let's consider fields `op` and `state`. Let's their initial state is `op=1`m
> `state=1`.
> The we do update `op=2` and call pgaio_io_update_state(2). Some other task
> can see (op=1, state=1) or (op=2, state=1) or (op=2, state=2) but not
> (op=1, state=2).
>
> The problem can happen only if some other task tries to update `op` without
> prior checking for `state`.
Which isn't possible here - the handle was idle (i.e. not accessed by another
backend) just before the assertion failure and is being initialized at that
time. Which all happens within one process, without any concurrent accesses.
> In this case we can get (op=3,state=1) as well as (op=2,state=2).
> But `op` field is updated in just two places: pgaio_io_reclaim and
> pgaio_io_stage. And in both cases we first check 'state' value.
>
> What makes me concern is what happen in case of branch misprediction.
> Assume that we have code:
>
> if (state == 1) {
> op = 2;
> } else {
> op = 3;
> }
>
> It seems to be absolutely legal, but what will happen in case if with
> speculative execution processor start executing first branch and does
> assignment op=2, but then state is loaded and used to be not equal to 1, so
> processor has to undone this execution. But for some moment we we can
> have state (state=0 and op=2) which can be observed by other task and cause
> assertion failure.
> Is it really possible?
No, the CPU will not retire ("commit") instructions that are still dependent
on speculatively executed instructions. And it won't do work that can't be
undone (like actually writing memory or doing system calls) as part of
speculative execution. Therefore the result of speculatively executed
instructions aren't visible to other cores [1].
Greetings,
Andres Freund
[1] Leaving aside side-channel attacks like spectre / meltdown and a lot of
subsequent issues, but those are observing side channels like presence of
a cacheline in the cpu cache, not actually speculatively written memory
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-08-27 23:16:29 | Re: Non-reproducible AIO failure |
Previous Message | Noah Misch | 2025-08-27 23:04:51 | Re: Buffer locking is special (hints, checksums, AIO writes) |