Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Donald Dong <xdong(at)csumb(dot)edu>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "simon(at)2ndquadrant(dot)com" <simon(at)2ndquadrant(dot)com>, "ams(at)2ndquadrant(dot)com" <ams(at)2ndquadrant(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Date: 2019-01-11 17:52:08
Message-ID: CA+TgmoZVmJX1+QTWw2tSnPHrnkwKZxC3ZsRynFB-Fpzm1Oxuhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 10, 2019 at 7:42 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I still think this whole direction of accessing the GUC in walreceiver
> is a bad idea and shouldn't be pursued further. There's definite
> potential for startup process and WAL receiver having different states
> of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
> in walreceiver make for horrible reporting up the chain.

I'm trying to assess the validity of this complaint. It seems to me
that it might advance the discussion to be more clear about what the
problem is here. When pg_ctl reload is executed (or the equivalent),
different backends reload the file at different times. Currently,
because the values are explicitly passed down from the startup process
to the walreceiver, it's guaranteed that they will both use the same
value. With this change, one might see an older value and the other
the current value. So there's room to be nervous about code like
this:

if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
{
... assorted other code ...
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName);
receivedUpto = 0;
}

With the patch, the PrimaryConnInfo and PrimarySlotName arguments are
removed from RequestXLogStreaming. That means that the new
walreceiver could come to a different conclusion about the values of
those arguments than the startup process. In particular, it could end
up thinking that primary_conninfo is empty when, if the startup
process had thought that, the walreceiver would never have been
launched in the first place. But it's not obvious that you've added
any logic in WALReceiverMain or elsewhere to compensate for this
possibility -- what would happen in that case? Would we crash?
Connect to the wrong server?

I might be wrong here, but I'm inclined to think that this scenario
hasn't really been contemplated carefully by the patch authors. There
are no added TAP tests for the scenario where the values differ
between the two processes, no code comments which explain why it's OK
if that happens, really no mention of it in the patch at all. And on
that basis I'm inclined to think that Andres is really quite correct
to be worried about this. The problem he's talking about here is very
low-probability because the race condition is narrow, but it's real,
and it surely needs to be handled somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-01-11 17:55:54 Re: Ryu floating point output patch
Previous Message Andrew Gierth 2019-01-11 17:33:19 Re: Ryu floating point output patch