Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Sergei Kornilov <sk(at)zsrv(dot)org>, Donald Dong <xdong(at)csumb(dot)edu>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "simon(at)2ndquadrant(dot)com" <simon(at)2ndquadrant(dot)com>, "ams(at)2ndquadrant(dot)com" <ams(at)2ndquadrant(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Date: 2019-01-11 23:10:07
Message-ID: 20190111231007.GA24889@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 11, 2019 at 12:52:08PM -0500, Robert Haas wrote:
> With the patch, the PrimaryConnInfo and PrimarySlotName arguments are
> removed from RequestXLogStreaming. That means that the new
> walreceiver could come to a different conclusion about the values of
> those arguments than the startup process. In particular, it could end
> up thinking that primary_conninfo is empty when, if the startup
> process had thought that, the walreceiver would never have been
> launched in the first place. But it's not obvious that you've added
> any logic in WALReceiverMain or elsewhere to compensate for this
> possibility -- what would happen in that case? Would we crash?
> Connect to the wrong server?

If I contemplate the patch this morning there is this bit:
@@ -291,32 +295,40 @@ WalReceiverMain(void)
/* Unblock signals (they were blocked when the postmaster forked
us) */
PG_SETMASK(&UnBlockSig);

+ /*
+ * Fail immediately if primary_conninfo goes missing, better safe than
+ * sorry.
+ */
+ if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));

So the answer to your question is: the WAL receiver fails to start.

> I might be wrong here, but I'm inclined to think that this scenario
> hasn't really been contemplated carefully by the patch authors. There
> are no added TAP tests for the scenario where the values differ
> between the two processes, no code comments which explain why it's OK
> if that happens, really no mention of it in the patch at all. And on
> that basis I'm inclined to think that Andres is really quite correct
> to be worried about this. The problem he's talking about here is very
> low-probability because the race condition is narrow, but it's real,
> and it surely needs to be handled somehow.

primary_conninfo and primary_slot_name are PGC_POSTMASTER now, so
adding tests now don't really make sense.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-11 23:33:54 Re: Three animals fail test-decoding-check on REL_10_STABLE
Previous Message Andrew Gierth 2019-01-11 23:04:51 declaration-after-statement (was Re: Ryu floating point output patch)