Re: Streaming replication failover with 3 servers

From: Sergey Konoplev <sergey(dot)konoplev(at)postgresql-consulting(dot)com>
To: Stuart Bishop <stuart(at)stuartbishop(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Streaming replication failover with 3 servers
Date: 2012-08-09 10:25:06
Message-ID: CAL_0b1syW8ayEOjZZg6+C4XuBzJh5QQ7Q6xu+arA3emgOqd5Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

Below is the process I use.

On Wed, Aug 8, 2012 at 7:51 PM, Stuart Bishop <stuart(at)stuartbishop(dot)net> wrote:
> I'm trying to work out failover and disaster recovery procedures for a
> cluster of three servers. Streaming replication is being used with a high
> wal_keep_segments, no log shipping is happening. I need to avoid the
> several hours it takes to rebuild a hot standby from scratch.

Let us suppose that there is a hot standby replication set up in a
cluster. The db1 server is a master and dbX are replicas. Also suppose
that we need to do a failover on one of our slaves.

First we need to define what replica will be a new master.

In case of the master failure you need to find the most caught up
replica. To do this compare WAL replay locations on replicas and chose
the one with the biggest value.

postgres=# SELECT pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
BAA/37DA2888
(1 row)

Note that If you chose not the most caught up one than other replicas
that have replayed later WAL entries must be reconfigured from
scratch. Otherwise their data can be corrupted and you will not get
any warnings about it.

In case of the planned switchover choose one that will be a new master
on your own.

Then stop all the slaves except the new master. Let say the remaining
slave is db2.

Use the command below to help the master and the remaining slave get
ahead of other slaves if you are not sure that they already are. The
command creates a minimal WAL entry.

postgres=# SELECT txid_current();

Now touch the failover file on the remaining slave to promote it as a
new master.

On the stopped slaves delete everything in the pg_xlog directory.

postgres(at)dbX: ~ $ rm -rf /db/data/pg_xlog/*

And copy the history file from the new master.

postgres(at)dbX: ~ $ scp db2:/db/data/pg_xlog/*.history /db/data/pg_xlog/

Then change the DSN to the new master and add the following
instruction in recovery.conf. This will make replicas to follow the
latest created timeline.

recovery_target_timeline = 'latest'

Now start the postgres service and wait until it is synced up to the new master.

Rebuild the old master as a replica from scratch.

>
> ServerA is the master.
> ServerB is a streaming hot standby and prefered failover server.
> ServerC is a streaming hot standby.
>
>
> For a planned failover, maintenance on ServerA:
>
> 1. Shutdown ServerB & ServerC
> 2. Shutdown ServerA
> 3. Copy pg_xlog from ServerA to ServerB and ServerC
> 4. Reconfigure ServerB as master, start it up.
> 5. Reconfigure ServerC as streaming hot standby of ServerB. Start it.
> 6. After maintenance, reconfigure ServerA as streaming hot standby
> of ServerB. Start it.
>
> For an unplanned failover, ServerA has exploded:
>
> 1. Run 'SELECT pg_last_xlog_receive_location()' on ServerB and ServerC,
> determining which is most up to date.
> 2. Shutdown ServerB and ServerC
> 3. If ServerC is more up to date, copy pg_xlog from ServerC to ServerB.
> 4. Reconfigure ServerB as master, start it up.
> 5. Reconfigure ServerC as streaming hot standby of ServerB, start it up.
>
>
> Does this look correct to people?
>
> Am I going to end up in trouble copying files into pg_xlog like this on a
> busy system?
>
> Is it overengineered? eg. will a master ensure everything is streamed to
> connected hot standbys before a graceful shutdown?
>
> --
> Stuart Bishop <stuart(at)stuartbishop(dot)net>
> http://www.stuartbishop.net/
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

--
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray(dot)ru(at)gmail(dot)com Skype: gray-hemp Phone: +79160686204

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Sergey Konoplev 2012-08-09 10:53:54 Re: How to make the row changes inside trigger function visible to the top level sql statement?
Previous Message Craig Ringer 2012-08-09 01:33:55 Re: PostgreSQL oom_adj postmaster process to -17