More subtle issues with cascading replication over timeline switches

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: More subtle issues with cascading replication over timeline switches
Date: 2013-01-18 11:57:54
Message-ID: 50F938C2.9000904@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When a standby starts up, and catches up with the master through the
archive, it copies the target timeline's history file from the archive
to pg_xlog. That's enough for that standby's purposes, but if there is a
cascading standby or pg_receivexlog connected to it, it will request the
timeline history files of *all* timelines between the starting point and
current target.

For example, create a master, and take a base backup from it. Use the
base backup to initialize two standby servers. Now perform failover
first to the first standby, and once the second standby has caught up,
fail over again, to the second standby. (Or as a shortcut, forget about
the standbys, and just create a recovery.conf file in the master with
restore_command='/bin/false' and restart it. That causes a timeline
switch. Repeat twice)

Now use the base backup to initialize a new standby server (you can kill
and delete the old ones), using the WAL archive. Set up a second,
cascading, standby server that connects to the first standby using
streaming replication, not WAL archive. This cascading standby will fail
to cross the timeline switch, because it doesn't find all the history
files in the standby:

C 2013-01-18 13:38:46.047 EET 7695 FATAL: could not receive timeline
history file from the primary server: ERROR: could not open file
"pg_xlog/00000002.history": No such file or directory

Indeed, looking at the pg_xlog, it's not there (I did a couple of extra
timeline switches:

~/pgsql.master$ ls -l data-master/pg_xlog/
total 131084
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000001
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000002
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000003
-rw------- 1 heikki heikki 41 Jan 18 13:38 00000002.history
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000003
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000004
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000005
-rw------- 1 heikki heikki 83 Jan 18 13:38 00000003.history
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000005
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000006
drwx------ 2 heikki heikki 4096 Jan 18 13:38 archive_status
~/pgsql.master$ ls -l data-standbyB/pg_xlog/
total 81928
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000001
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000002
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000003
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000004
-rw------- 1 heikki heikki 83 Jan 18 13:38 00000003.history
-rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000005
drwx------ 2 heikki heikki 4096 Jan 18 13:38 archive_status

This can be thought of as another variant of the same issue that was
fixed by commit 60df192aea0e6458f20301546e11f7673c102101. When standby B
scans for the latest timeline, it finds it to be 3, and it reads the
timeline history file for 3. After that patch, it also saves it in
pg_xlog. It doesn't save the timeline history file for timeline 2,
because that's included in the history of timeline 3. However, when
standby C connects, it will try to fetch all the history files that it
doesn't have, including 00000002.history, which throws the error.

A related problem is that at the segment containing the timeline switch,
standby has only restored from archive the WAL file of the new timeline,
not the old one. For example above, the timeline switch 1 -> 2 happened
while inserting to segment 000000010000000000000003, and a copy of that
partial segment was created with the timeline's ID as
000000020000000000000003. The standby only has the segment from the new
timeline in pg_xlog, which is enough for that standby's purposes, but
will cause an error when the cascading standby tries to stream it:

C 2013-01-18 13:46:12.334 EET 8579 FATAL: error reading result of
streaming command: ERROR: requested WAL segment
000000010000000000000003 has already been removed

A straightforward fix would be for the standby to restore those files
that the cascading standby needs from the WAL archive, even if they're
not strictly required for that standby itself. But actually, isn't it a
bad idea that we store the partial segment, 000000010000000000000003 in
this case, in the WAL archive? There's no way to tell that it's partial,
and it can clash with a complete segment if more WAL is generated on
that timeline. I just argued that pg_receivexlog should not do that, and
hence keep the .partial suffix in the same situation, in
http://www.postgresql.org/message-id/50F56245.8050802@vmware.com.

This needs some more thought. I'll try to come up with something next
week, but if anyone has any ideas..

- Heikki

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-01-18 12:05:05 Re: Passing connection string to pg_basebackup
Previous Message Amit Kapila 2013-01-18 11:41:36 Re: Passing connection string to pg_basebackup