Re: pg_rewind succeed but postgresql showing error when trying to make standby with common ancestor

From: niraj nandane <niraj(dot)nandane(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind succeed but postgresql showing error when trying to make standby with common ancestor
Date: 2022-12-16 06:03:02
Message-ID: CALpWO+ARv0pzBD-RbMtPUO-XMYDN3MKnkmQZ3eCOXkeT4K4PLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

<<
In this case, if you want to make node3 a standby of node2, just
disconnect (shutdown) node3 first, then promote node2, at last start
node3 as a standby of node2. No need for pg_rewind since node3 is
surely in the past of node2. (But a large replication delay might
badly affect.) pg_rewind doesn't work also in this case since node3
is likely be at before node2's promotion LSN.
>>
Both the case with and without pg_rewind working. Here is output of
pg_rewind:

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 0/316DD838 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/2E000060 on timeline 1
pg_rewind: reading source file list
pg_rewind: reading target file list
pg_rewind: reading WAL in target
pg_rewind: need to copy 98 MB (total source directory size is 219 MB)
0/100453 kB (0%) copied
100453/100453 kB (100%) copied
pg_rewind: creating backup label and updating control file
pg_rewind: syncing target data directory
pg_rewind: Done!

<<
Yeah. But I think pg_rewind is not needed at all in this case.
>>
When i promote node2 and node3 having common ancestor, and when joining
node3 to the node2, pg_rewind should return code non-zero.
What pg_rewind does it, gives below and set return code as 0.

pg_rewind: connected to server
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required

Can this be done? Can pg_rewind return code as non-zero.

Comment from Laurenz Albe at:
https://dba.stackexchange.com/questions/320997/pg-rewind-succeed-but-postgresql-showing-error-when-trying-to-make-standby-with?noredirect=1#comment626006_320997

I would say that it is a bug that pg_rewind returns 0. It should not only
check the time line number, but also compare the time the time line
branched off. You may want to complain to the pgsql-hackers or pgsql-bugs
mailing lists about that

Thanks a lot.

On Fri, Dec 16, 2022 at 7:41 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> Hello.
>
> At Thu, 15 Dec 2022 15:03:28 +0530, niraj nandane <niraj(dot)nandane(at)gmail(dot)com>
> wrote in
> > Have 3 node setup: node1(50.2), node2(50.3) and node3(50.4). node1(50.2)
> is
> > primary and node2(50.3) and node3(50.4) are standby. I promoted node3
> first
> > and then node2 to make them independent. When i try to make the node3
> slave
> > of node2, using pg_rewind, it says below:
>
> pg_rewind is intended to be used for the case where new primary has
> been promoted in the past of new standby (called "diverging") that has
> not promoted. Thus this is not the case.
>
> In this case, if you want to make node3 a standby of node2, just
> disconnect (shutdown) node3 first, then promote node2, at last start
> node3 as a standby of node2. No need for pg_rewind since node3 is
> surely in the past of node2. (But a large replication delay might
> badly affect.) pg_rewind doesn't work also in this case since node3
> is likely be at before node2's promotion LSN.
>
> > After this, when i start postgresql on the node3 in standby mode, i get
> > below:
>
> ...(edited)
> > Dec 15 07:57:33 LOG: started streaming WAL from primary at 0/2F000000
> on timeline 2
> > Dec 15 07:57:33 FATAL: could not receive data from WAL stream: ERROR:
> requested WAL segment 00000002000000000000002F has already been removed
> (repeats)
>
> I'm not sure how long the interval between the promotions of node2 and
> node3, but since node2 is likely to be advanced than node3, it's not
> surprising if node2 has been removed a segment that node3 is at.
>
> > There is no WAL archiving set. This is PostgreSQL 14.4. Does this means
> > PostgreSQL pg_rewind can't be used when forming cluster between two
> > independent primary having common ancestor?
>
> Yes, as explained above.
>
> > Note: If i just promote node2 and run pg_rewind on node3 to join with
> > node2, then it works correctly.
>
> Yeah. But I think pg_rewind is not needed at all in this case.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

--

Thanks,
Niraj Nandane,
Veritas LLC, Pune

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2022-12-16 08:07:36 Re: BUG #17720: pg_dump creates a dump with primary key that cannot be restored, when specifying 'using index ...'
Previous Message Peter Geoghegan 2022-12-16 04:39:54 Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)