Re: allow online change primary_conninfo

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sk(at)zsrv(dot)org
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, david(at)pgmasters(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: allow online change primary_conninfo
Date: 2019-09-25 05:19:17
Message-ID: 20190925.141917.160008163.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Sat, 21 Sep 2019 13:06:25 +0300, Sergei Kornilov <sk(at)zsrv(dot)org> wrote in <41171569060385(at)myt5-b646bde4b8f3(dot)qloud-c(dot)yandex(dot)net>
> Hello
>
> Thank you for review! Can you please also check v4 version? v5 implements design suggested by Kyotaro Horiguchi-san, while v4 has another design. Which one do you prefer? Or are both wrong?
>
> > I can't parse that comment. What does "skipping to starting" mean? I
> > assume it's just about avoiding wal_retrieve_retry_interval, but I think
> > the comment ought to be rephrased.
>
> Design idea is to rewrite current state from working XLOG_FROM_STREAM to failed XLOG_FROM_ARCHIVE (without actually try this method on this iteration) and immediately go to next iteration to advance the state machine. Next iteration after failed archive recovery is walreceiver. So walreceiver will be stopped just before this lines and started on next iteration. Walreceiver will be restarted, we do not call restore_command

Sorry, it's my bad. It meant "If wal receiver is requested to
restart, change state to XLOG_FROM_STREAM immediately skipping
the next XLOG_FROM_ARCHIVE state.".

> > Also, should we really check this before scanning for new timelines?
>
> Hmm, on the next day... No, this is not really necessary.

No problem when timeline is not changed when explicit restart of
wal receiver. But if it happened just after the standby received
new hisotry file, we suffer an extra restart of wal receiver. It
seems better that we check that in the case.

> > Why is it the right thing to change to XLOG_FROM_ARCHIVE when we're just
> > restarting walreceiver? The server might unnecessarily get stuck in
> > archive based recovery for a long time this way? It seems to me that
> > we'd need to actually trigger RequestXLogStreaming() in this case.
>
> I hope I clarified this in design idea description above.

My suggestion was just to pretend that the next XLOG_FROM_STREAM
failed, but the outcome actually looks wierd as Andres commented.

I think that comes from the structure of the state machine. WAL
receiver is started not at the beginning of XLOG_FROM_STREAM
state but at the end of XLOG_FROM_ARCHIVE. So we need to switch
once to XLOG_FROM_ARCHIVE in order to start wal receiver.

So, I'd like to propose to move the stuff to the second switch().
(See the attached incomplete patch.) This is rather similar to
Sergei's previous proposal, but the structure of the state
machine is kept.

Some other comments are below.

In ProcessStartupSigHup, conninfo and slotname don't need to be
retained until the end of the function.

The log message in the function seems to be too detailed. On the
other hand, if we changed primary_conninfo to '' (stop) or vise
versa (start), the message (restart) looks strange.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
incomplete_xlog_c_change.patch text/x-patch 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-09-25 05:36:41 Re: Fix example in partitioning documentation
Previous Message Julien Rouhaud 2019-09-25 05:03:52 Re: Hypothetical indexes using BRIN broken since pg10