Re: pg_basebackup from cascading standby after timeline switch

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup from cascading standby after timeline switch
Date: 2012-12-21 12:54:02
Message-ID: 50D45BEA.4070409@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.12.2012 18:58, Magnus Hagander wrote:
> On Mon, Dec 17, 2012 at 5:19 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> writes:
>>> I'm not happy with the fact that we just ignore the problem in a backup
>>> taken from a standby, silently giving the user a backup that won't start
>>> up. Why not include the timeline history file in the backup?
>>
>> +1. I was not aware that we weren't doing that --- it seems pretty
>> foolish, especially since as you say they're tiny.
>
> Yeah, +1. That should probably have been a part of the whole
> "basebackup from slave" patch, so it can probably be considered a
> back-patchable bugfix in itself, no?

Yes, this should be backpatched to 9.2. I came up with the attached.

However, thinking about this some more, there's a another bug in the way
WAL files are included in the backup, when a timeline switch happens.
basebackup.c includes all the WAL files on ThisTimeLineID, but when the
backup is taken from a standby, the standby might've followed a timeline
switch. So it's possible that some of the WAL files should come from
timeline 1, while others should come from timeline 2. This leads to an
error like "requested WAL segment 00000001000000000000000C has already
been removed" in pg_basebackup.

Attached is a script to reproduce that bug, if someone wants to play
with it. It's a bit sensitive to timing, and needs tweaking the paths at
the top.

One solution to that would be to pay more attention to the timelines to
include WAL from. basebackup.c could read the timeline history file, to
see exactly where the timeline switches happened, and then construct the
filename of each WAL segment using the correct timeline id. Another
approach would be to do readdir() on pg_xlog, and include all WAL files,
regardless of timeline IDs, that fall in the right XLogRecPtr range. The
latter seems easier to backpatch.

- Heikki

Attachment Content-Type Size
include-all-tli-files-in-backup-1.patch text/x-diff 3.4 KB
recipe12.sh application/x-sh 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-12-21 14:01:42 Re: need a function to extract list items from pg_node_tree
Previous Message Andres Freund 2012-12-21 12:26:32 Re: need a function to extract list items from pg_node_tree