pgsql: Fix, or at least ameliorate, bugs in logicalrep_worker_launch().

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix, or at least ameliorate, bugs in logicalrep_worker_launch().
Date: 2017-09-18 15:40:08
Message-ID: E1dty9Y-00022T-KB@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix, or at least ameliorate, bugs in logicalrep_worker_launch().

If we failed to get a background worker slot, the code just walked
away from the logicalrep-worker slot it already had, leaving that
looking like the worker is still starting up. This led to an indefinite
hang in subscription startup, as reported by Thomas Munro. We must
release the slot on failure.

Also fix a thinko: we must capture the worker slot's generation before
releasing LogicalRepWorkerLock the first time, else testing to see if
it's changed is pretty meaningless.

BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a
ticking time bomb, even without considering the possibility of elog(ERROR)
in one of the other functions it calls. Really, this entire business needs
a redesign with some actual thought about error recovery. But for now
I'm just band-aiding the case observed in testing.

Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com

Branch
------
REL_10_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/c1bde0747983993a695d12c4403a730b2be579d2

Modified Files
--------------
src/backend/replication/logical/launcher.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2017-09-18 16:16:42 Re: Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.
Previous Message Peter Eisentraut 2017-09-18 15:13:49 pgsql: Update some dead external links in the documentation