Race condition in InvalidateObsoleteReplicationSlots()

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Fujii Masao <fujii(at)postgresql(dot)org>
Subject: Race condition in InvalidateObsoleteReplicationSlots()
Date: 2021-04-08 00:10:37
Message-ID: 20210408001037.wfmk6jud36auhfqm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I was looking at InvalidateObsoleteReplicationSlots() while reviewing /
polishing the logical decoding on standby patch. Which lead me to notice that
I think there's a race in InvalidateObsoleteReplicationSlots() (because
ResolveRecoveryConflictWithLogicalSlots has a closely related one).

void
InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno)
{
...
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
for (int i = 0; i < max_replication_slots; i++)
{
...
if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN)
continue;
LWLockRelease(ReplicationSlotControlLock);
...
for (;;)
{
...
wspid = ReplicationSlotAcquireInternal(s, NULL, SAB_Inquire);
...
SpinLockAcquire(&s->mutex);
s->data.invalidated_at = s->data.restart_lsn;
s->data.restart_lsn = InvalidXLogRecPtr;
SpinLockRelease(&s->mutex);
...

As far as I can tell there's no guarantee that the slot wasn't concurrently
dropped and another replication slot created at the same offset in
ReplicationSlotCtl->replication_slots. Which we then promptly would
invalidate, regardless of the slot not actually needing to be invalidated.

Note that this is different from the race mentioned in a comment:
/*
* Signal to terminate the process that owns the slot.
*
* There is the race condition where other process may own
* the slot after the process using it was terminated and before
* this process owns it. To handle this case, we signal again
* if the PID of the owning process is changed than the last.
*
* XXX This logic assumes that the same PID is not reused
* very quickly.
*/

It's one thing to terminate a connection erroneously - permanently breaking a
replica due to invalidating the wrong slot or such imo is different.

Interestingly this problem seems to have been present both in

commit c6550776394e25c1620bc8258427c8f1d448080d
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: 2020-04-07 18:35:00 -0400

Allow users to limit storage reserved by replication slots

commit f9e9704f09daf882f5a1cf1fbe3f5a3150ae2bb9
Author: Fujii Masao <fujii(at)postgresql(dot)org>
Date: 2020-06-19 17:15:52 +0900

Fix issues in invalidation of obsolete replication slots.

I think this can be solved in two different ways:

1) Hold ReplicationSlotAllocationLock with LW_SHARED across most of
InvalidateObsoleteReplicationSlots(). That way nobody could re-create a new
slot in the to-be-obsoleted-slot's place.

2) Atomically check whether the slot needs to be invalidated and try to
acquire if needed. Don't release ReplicationSlotControlLock between those
two steps. Signal the owner to release the slot iff we couldn't acquire the
slot. In the latter case wait and then recheck if the slot still needs to
be dropped.

To me 2) seems better, because we then can also be sure that the slot still
needs to be obsoleted, rather than potentially doing so unnecessarily.

It looks to me like several of the problems here stem from trying to reuse
code from ReplicationSlotAcquireInternal() (which before this was just named
ReplicationSlotAcquire()). I don't think that makes sense, because cases like
this want to check if a condition is true, and acquire it only if so.

IOW, I think this basically needs to look like ReplicationSlotsDropDBSlots(),
except that a different condition is checked, and the if (active_pid) case
needs to prepare a condition variable, signal the owner and then wait on the
condition variable, to restart after.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-04-08 00:17:42 Re: Why is specifying oids = false multiple times in create table is silently ignored?
Previous Message Peter Geoghegan 2021-04-07 23:41:35 Re: New IndexAM API controlling index vacuum strategies