Re: When pg_rewind success, the database can't startup

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: hemin <min(dot)he(at)ww-it(dot)cn>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: When pg_rewind success, the database can't startup
Date: 2018-06-19 06:59:40
Message-ID: 20180619065940.GA31737@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 14, 2018 at 05:30:20PM +0800, hemin wrote:
> There is a primary standby cluster with async replication, when large
> data inserting into the primary node, we stop the database by
> hand.

How do you stop it?

> Then promote the standby node to be new primary node and insert
> new data into it.  Finally use pg_rewind to avoid WAL diverged
> success, but the node can not to be startup with fallow error:

That looks like a correctly flow, roughly. Did you issue a manual
checkpoint on the promoted standby before running pg_rewind? That's
necessary to avoid confusing pg_rewind which uses the on-disk data of
the source's control file for sanity checks.

> “2018-06-06 14:40:18.686 CST [2687] FATAL:  requested timeline 3 does
> not contain minimum recovery point 0/DB35BE80 on timeline 1

This means that the instance used for recovery is not part of the
timeline you are trying to link to. In short, the timeline history of
your nodes may have been messed up.

> (4) Standby Node:               promote the standby node to be
> primary:

Here you should issue a checkpoint manually on the promoted standby.

> (5) Standby Node:        inset 3,000,000 rows data into database use
> pgbench to:

You should also be careful that the previous master, as known as the
instance which has been rewound and that you are trying to plug back
into the cluster, needs also WAL segments down from the last checkpoint
before WAL has forked on its new timeline.

Which version of Postgres is that? 9.5? Because if that's the case
pg_rewind in 9.5 is very primitive in the way it handles timeline jumps
and 9.6 got way smarter.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-06-19 07:09:56 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Andrew Gierth 2018-06-19 06:05:43 Re: BUG #15247: At 'ALTER TABLE ADD COLUMN fast default' , Set attmissingval to NULL in the pg_attribute, query fail