Re: allow online change primary_conninfo

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allow online change primary_conninfo
Date: 2019-07-30 06:42:39
Message-ID: 20190730064239.GJ1742@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 01, 2019 at 02:33:39PM +0300, Sergei Kornilov wrote:
> Updated version attached. Merge conflict was about tests count in
> 001_stream_rep.pl. Nothing else was changed. My approach can be
> still incorrect, any redesign ideas are welcome. Thanks in advance!

It has been some time, and I am finally catching up with this patch.

+ <para>
+ WAL receiver will be restarted after <varname>primary_slot_name</varname>
+ was changed.
</para>
The sentence sounds strange. Here is a suggestion:
The WAL receiver is restarted after an update of primary_slot_name (or
primary_conninfo).

The comment at the top of the call of ProcessStartupSigHup() in
HandleStartupProcInterrupts() needs to be updated as it mentions a
configuration file re-read, but that's not the case anymore.

pendingRestartSource's name does not represent what it does, as it is
only associated with the restart of a WAL receiver when in streaming
state, and that's a no-op for the archive mode and the local mode.

So, the patch splits the steps taken when checking for a WAL source by
adding an extra step after the failure handling that you are calling
the restart step. When a failure happens for the stream mode
(shutdown of WAL receiver, promotion. etc), there is a switch to the
archive mode, and nothing is changed in this case in your patch. So
when shutting down the WAL receiver after a parameter change, what
happens is that the startup process waits for retrieve_retry_interval
before moving back to the archive mode. Only after scanning again the
archives do we restart a new WAL receiver. However, if the restart of
the WAL receiver is planned because of an update of primary_conninfo
(or slot), shouldn't the follow-up mode be XLOG_FROM_STREAM without
waiting for wal_retrieve_retry_interval ms for extra WAL to become
available?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-30 06:50:29 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Craig Ringer 2019-07-30 06:13:40 Re: tap tests driving the database via psql