Re: pg_rewind, a tool for resynchronizing an old master after failover

From: Thom Brown <thom(at)linux(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind, a tool for resynchronizing an old master after failover
Date: 2013-05-23 14:03:51
Message-ID: CAA-aLv6cd-JpN2qkBotcKkMiFuab-xjMDRBpoQ26ySys4X1SnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> Hi,
>
> I've been hacking on a tool to allow resynchronizing an old master server
> after failover. The need to do a full backup/restore has been a common
> complaint ever since we've had streaming replication. I saw on the wiki that
> this was discussed in the dev meeting; too bad I couldn't make it.
>
> In a nutshell, the idea is to do copy everything that has changed between
> the cluster, like rsync does, but instead of reading through all files, use
> the WAL to determine what has changed. Here's a somewhat more detailed
> explanation, from the README:
>
> Theory of operation
> -------------------
>
> The basic idea is to copy everything from the new cluster to old, except for
> the blocks that we know to be the same.
>
> 1. Scan the WAL log of the old cluster, starting from the point where
> the new cluster's timeline history forked off from the old cluster. For each
> WAL record, make a note of the data blocks that are touched. This yields a
> list of all the data blocks that were changed in the old cluster, after the
> new cluster forked off.
>
> 2. Copy all those changed blocks from the new master to the old master.
>
> 3. Copy all other files like clog, conf files etc. from the new cluster
> to old. Everything except the relation files.
>
> 4. Apply the WAL from the new master, starting from the checkpoint
> created at failover. (pg_rewind doesn't actually apply the WAL, it just
> creates a backup label file indicating that when PostgreSQL is started, it
> will start replay from that checkpoint and apply all the required WAL)
>
>
> Please take a look: https://github.com/vmware/pg_rewind

6 instances set up:

[Primary (5530)]
|
---[Standby 1 (5531)]
|
---[Standby 2 (5532)]
|
---[Standby 3 (5533)]
|
---[Standby 4 (5534)]
|
---[Standby 5 (5535)]

1) Created a table on the primary with some data.
2) Promoted Standby 1
3) Cleanly shut down Primary
4) pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres'

Last common WAL position: 0/30227F8 on timeline 1
Last common checkpoint at 0/30227F8 on timeline 1
error reading xlog record: record with zero length at 0/3022860
Done!

Contents of pg_xlog directory in Primary and Standby 1:
thom(at)swift /tmp $ ls -l primary/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:52 00000002.history
drwx------ 2 thom users 80 May 23 09:52 archive_status
thom(at)swift /tmp $ ls -l standby1/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:50 00000002.history
drwx------ 2 thom users 80 May 23 09:50 archive_status

5) Changed recovery.done in primary to point its primary_conninfo port
to 5531 (that of Standby 1).
6) Renamed it to .conf.
7) Changed postgresql.conf to set the port back to its original one
(as pg_rewind has caused it to match that of Standby 1)
8) Start Primary

Latest log in primary reads:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: invalid xl_info in checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/tmp/primary/backup_label".
LOG: startup process (PID 31503) exited with exit code 1
LOG: aborting startup due to startup process failure

9) Okay, so I'll delete that label and try again. Now all is well:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/3022828
LOG: record with zero length at 0/3041A60
LOG: consistent recovery state reached at 0/3041A60
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 2

10) Connect to Standby 1 and insert more rows into our original table.
11) Connect to Primary and those rows are appearing.

--
Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-05-23 14:31:53 Re: Time limit for a process to hold Content lock in Buffer Cache
Previous Message German Becker 2013-05-23 13:18:53 Re: WAL segments (names) not in a sequence