pgsql: Fix race condition during replication origin drop.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix race condition during replication origin drop.
Date: 2018-01-09 17:09:40
Message-ID: E1eYxPA-0004nm-72@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix race condition during replication origin drop.

replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended. The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop. What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely. This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs. We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/1f5adbd799cf5bf98259003840c69c65c6b021db

Modified Files
--------------
src/backend/replication/logical/origin.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-09 17:32:39 pgsql: Fix ssl tests for when tls-server-end-point is not supported
Previous Message Tom Lane 2018-01-09 16:39:23 pgsql: Allow ConditionVariable[PrepareTo]Sleep to auto-switch between C