Multiple Slave Failover with PITR

From: Ken Brush <kbrush(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Multiple Slave Failover with PITR
Date: 2012-03-27 17:47:48
Message-ID: CANCJzPaSmhsUKgakCoHHwNj=8o6Q5UK4YKeo3-NzALorikKJbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello everyone,

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

I still use the fail-proof method above to re-incorporate the old
master as a new slave.

Sincerely,
-Ken

Responses

Browse pgsql-general by date

  From Date Subject
Next Message W. David Jarvis 2012-03-27 18:37:30 Valid query times out when run from bash script
Previous Message Frank Lanitz 2012-03-27 16:18:52 Re: configuring RAID10 for data in Amazon EC2 cloud?

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-03-27 17:48:52 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
Previous Message Robert Haas 2012-03-27 17:47:29 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)