Race condition in SyncRepGetSyncStandbysPriority

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-03-27 01:26:49
Message-ID: 21519.1585272409@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Twice in the past month [1][2], buildfarm member hoverfly has managed
to reach the "unreachable" Assert(false) at the end of
SyncRepGetSyncStandbysPriority.

What seems likely to me, after quickly eyeballing the code, is that
hoverfly is hitting the blatantly-obvious race condition in that function.
Namely, that the second loop supposes that the state of the walsender
array hasn't changed since the first loop.

The minimum fix for this, I suppose, would have the first loop capture
the sync_standby_priority value for each walsender along with what it's
already capturing. But I wonder if the whole function shouldn't be
rewritten from scratch, because it seems like the algorithm is both
expensively brute-force and unintelligible, which is a sad combination.
It's likely that the number of walsenders would never be high enough
that efficiency could matter, but then couldn't we use an algorithm
that is less complicated and more obviously correct? (Because the
alternative conclusion, if you reject the theory that a race is happening,
is that the algorithm is just flat out buggy; something that's not too
easy to disprove either.)

Another fairly dubious thing here is that whether or not *am_sync
gets set depends not only on whether MyWalSnd is claiming to be
synchronous but on how many lower-numbered walsenders are too.
Is that really the right thing?

But worse than any of that is that the return value seems to be
a list of walsender array indexes, meaning that the callers cannot
use it without making even stronger assumptions about the array
contents not having changed since the start of this function.

It sort of looks like the design is based on the assumption that
the array contents can't change while SyncRepLock is held ... but
if that's the plan then why bother with the per-walsender spinlocks?
In any case this assumption seems to be failing, suggesting either
that there's a caller that's not holding SyncRepLock when it calls
this function, or that somebody is failing to take that lock while
modifying the array.

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-02-29%2001%3A34%3A55
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-26%2013%3A51%3A15

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-03-27 01:31:08 Re: Memory-Bounded Hash Aggregation
Previous Message Kartyshov Ivan 2020-03-27 01:15:59 Re: [HACKERS] make async slave to wait for lsn to be replayed