Re: pg_rewind vs superuser

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(at)paquier(dot)xyz
Cc: magnus(at)hagander(dot)net, mbanck(at)gmx(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind vs superuser
Date: 2019-04-08 07:14:42
Message-ID: 20190408.161442.167396698.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 8 Apr 2019 15:17:25 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20190408061725(dot)GF2712(at)paquier(dot)xyz>
> On Sun, Apr 07, 2019 at 03:06:56PM +0200, Magnus Hagander wrote:
> > So can we *detect* that this is the case? Because if so, we could perhaps
> > just wait for it to be done? Because there will always be one?
>
> Yes, this one is technically possible. We could add a timeout option
> which checks each N seconds the control file of the online source and
> sees if its timeline differs or not with the target, waiting for the
> change to happen. If we do that, we may want to revisit the behavior
> of not issuing an error if the source and the target are detected as
> being on the same timeline, and consider it as a failure.
>
> > The main point is -- we know from experience that it's pretty fragile to
> > assume the user read the documentation :) So if we can find *any* way to
> > handle this in code rather than docs, that'd be great. We would still
> > absolutely want the docs change for back branches of course.
>
> Any veeeeery recent experience on the matter perhaps? :)

I (am not Magnus) saw a similar but a bit different case. Just
after master's promote, standby was killed in immediate mode
after catching up to master's latest TLI but before restartpoint
finished. They are in different TLIs in control data so *the
tool* decides to try pg_rewind. Restart->shutdown (*1) sequence
for cleanup made standby catch up to the master's TLI but their
histories have diverged from each other in the latest TLI. Of
course, pg_rewind says "no need to rewind since they're on the
same TLI". The subsequent replication starts from the segment
beginning and overwrote the WAL records already applied on the
standby. The result was a broken database. I suspect that it is
the result of a kind of misoperation and sane operation won't
cause the situation, but such situation could be "cleaned up" if
pg_rewind did the work for a replication set on the same TLI.

I haven't find exactly what happend yet in the case.

*1: It is somewhat strange, that recovery reaches to the next TLI
despite that I heard that the restart is in non-standby,
non-recovery mode.. Something should be wrong.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuzuko Hosoya 2019-04-08 07:57:35 RE: Problem with default partition pruning
Previous Message Zhang, Jie 2019-04-08 07:10:53 Translation updates for zh_CN.po (Chinese Simplified)