Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Sergei Kornilov <sk(at)zsrv(dot)org>, Donald Dong <xdong(at)csumb(dot)edu>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "simon(at)2ndquadrant(dot)com" <simon(at)2ndquadrant(dot)com>, "ams(at)2ndquadrant(dot)com" <ams(at)2ndquadrant(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Date: 2019-01-16 01:48:24
Message-ID: 20190116014824.GJ1433@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 12, 2019 at 08:10:07AM +0900, Michael Paquier wrote:
> On Fri, Jan 11, 2019 at 12:52:08PM -0500, Robert Haas wrote:
>> With the patch, the PrimaryConnInfo and PrimarySlotName arguments are
>> removed from RequestXLogStreaming. That means that the new
>> walreceiver could come to a different conclusion about the values of
>> those arguments than the startup process. In particular, it could end
>> up thinking that primary_conninfo is empty when, if the startup
>> process had thought that, the walreceiver would never have been
>> launched in the first place. But it's not obvious that you've added
>> any logic in WALReceiverMain or elsewhere to compensate for this
>> possibility -- what would happen in that case? Would we crash?
>> Connect to the wrong server?
>
> If I contemplate the patch this morning there is this bit:
> @@ -291,32 +295,40 @@ WalReceiverMain(void)
> /* Unblock signals (they were blocked when the postmaster forked
> us) */
> PG_SETMASK(&UnBlockSig);
>
> + /*
> + * Fail immediately if primary_conninfo goes missing, better safe than
> + * sorry.
> + */
> + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));
>
> So the answer to your question is: the WAL receiver fails to start.

Robert, does this answer your question? I agree that depending on the
timing, we could finish by having the startup process spawning a WAL
receiver with a given connection string, and that the WAL receiver
could use a different one, but as long as we fail the WAL receiver
startup this does not seem like an issue to me, so I disagree with the
upthread statement that the patch author has not thought about such
cases :)

It seems to me that making the WAL receiver relying more on the GUC
context makes it more robust when it comes to reloading the parameters
which would be an upcoming change as we need to rely on the WAL
receiver check the GUC context itself and FATAL (or we would need to
have the startup process check for a change in
primary_conninfo/primary_slot_name, and then have the startup process
kill the WAL receiver by itself). In short, handling all that in the
WAL receiver would be more robust in the long term in my opinion as we
reduce interactions between the WAL receiver and the startup process.
And on top of it we get rid of ready_to_display, which is something I
am unhappy with since we had to introduce it.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-01-16 01:54:28 Re: O_DIRECT for relations and SLRUs (Prototype)
Previous Message Michael Paquier 2019-01-16 01:39:29 Re: Log a sample of transactions