Re: Teaching pg_receivexlog to follow timeline switches

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-17 14:59:19
Message-ID: 50F811C7.4080100@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.01.2013 16:56, Robert Haas wrote:
> On Wed, Jan 16, 2013 at 11:08 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> I'd prefer to leave the .partial suffix in place, as the segment really
>> isn't complete. It doesn't make a difference when you recover to the latest
>> timeline, but if you have a more complicated scenario with multiple
>> timelines that are still "alive", ie. there's a server still actively
>> generating WAL on that timeline, you'll easily get confused.
>>
>> As an example, imagine that you have a master server, and one standby. You
>> maintain a WAL archive for backup purposes with pg_receivexlog, connected to
>> the standby. Now, for some reason, you get a split-brain situation and the
>> standby server is promoted with new timeline 2, while the real master is
>> still running. The DBA notices the problem, and kills the standby and
>> pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
>> pg_receivexlog's target directory, and re-points pg_recevexlog to the master
>> while he re-builds the standby server from backup. At that point,
>> pg_receivexlog will start streaming from the end of the zero-padded segment,
>> not knowing that it was partial, and you have a hole in the archived WAL
>> stream. Oops.
>>
>> The DBA could avoid that by also removing the last WAL segment on timeline
>> 1, the one that was partial. But it's really not obvious that there's
>> anything wrong with that segment. Keeping the .partial suffix makes it
>> clear.
>
> I shudder at the idea that the DBA is manually involved in any of this.

The scenario I described is that you screwed up your failover
environment, and end up with a split-brain situation by accident. The
DBA certainly needs to be involved to recover from that.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-01-17 15:12:09 Re: Teaching pg_receivexlog to follow timeline switches
Previous Message Robert Haas 2013-01-17 14:56:49 Re: Teaching pg_receivexlog to follow timeline switches