Re: Teaching pg_receivexlog to follow timeline switches

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-16 16:08:31
Message-ID: 50F6D07F.9010207@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.01.2013 20:22, Fujii Masao wrote:
> On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> Now that a standby server can follow timeline switches through streaming
>> replication, we should do teach pg_receivexlog to do the same. Patch
>> attached.
>>
>> I made one change to the way START_STREAMING command works, to better
>> support this. When a standby server reaches the timeline it's streaming from
>> the master, it stops streaming, fetches any missing timeline history files,
>> and parses the history file of the latest timeline to figure out where to
>> continue. However, I don't want to parse timeline history files in
>> pg_receivexlog. Better to keep it simple. So instead, I modified the
>> server-side code for START_STREAMING to return the next timeline's ID at the
>> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to return
>> not only the start XLogRecPtr, but also the corresponding timeline ID.
>> Otherwise we might try to start streaming from wrong timeline if you issue a
>> BASE_BACKUP at the same moment the server switches to a new timeline.
>>
>> When pg_receivexlog switches timeline, what to do with the partial file on
>> the old timeline? When the timeline changes in the middle of a WAL segment,
>> the segment old the old timeline is only half-filled. For example, when
>> timeline changes from 1 to 2, you'll have this in pg_xlog:
>>
>> 000000010000000000000006
>> 000000010000000000000007
>> 000000010000000000000008
>> 000000020000000000000008
>> 00000002.history
>>
>> The segment 000000010000000000000008 is only half-filled, as the timeline
>> changed in the middle of that segment. The beginning portion of that file is
>> duplicated in 000000020000000000000008, with the timeline-changing
>> checkpoint record right after the duplicated portion.
>>
>> When we stream that with pg_receivexlog, and hit the timeline switch, we'll
>> have this situation in the client:
>>
>> 000000010000000000000006
>> 000000010000000000000007
>> 000000010000000000000008.partial
>>
>> What to do with the partial file? One option is to rename it to
>> 000000010000000000000008. However, if you then kill pg_receivexlog before it
>> has finished streaming a full segment from the new timeline, on restart it
>> will try to begin streaming WAL segment 000000010000000000000009, because it
>> sees that segment 000000010000000000000008 is already completed. That'd be
>> wrong.
>
> Can't we rename .partial file safely after we receive a full segment
> of the WAL file
> with new timeline and the same logid/segmentid?

I'd prefer to leave the .partial suffix in place, as the segment really
isn't complete. It doesn't make a difference when you recover to the
latest timeline, but if you have a more complicated scenario with
multiple timelines that are still "alive", ie. there's a server still
actively generating WAL on that timeline, you'll easily get confused.

As an example, imagine that you have a master server, and one standby.
You maintain a WAL archive for backup purposes with pg_receivexlog,
connected to the standby. Now, for some reason, you get a split-brain
situation and the standby server is promoted with new timeline 2, while
the real master is still running. The DBA notices the problem, and kills
the standby and pg_receivexlog. He deletes the XLOG files belonging to
timeline 2 in pg_receivexlog's target directory, and re-points
pg_recevexlog to the master while he re-builds the standby server from
backup. At that point, pg_receivexlog will start streaming from the end
of the zero-padded segment, not knowing that it was partial, and you
have a hole in the archived WAL stream. Oops.

The DBA could avoid that by also removing the last WAL segment on
timeline 1, the one that was partial. But it's really not obvious that
there's anything wrong with that segment. Keeping the .partial suffix
makes it clear.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-01-16 16:12:29 Re: log_lock_waits to identify transaction's relation
Previous Message Noah Misch 2013-01-16 16:02:10 Re: Parallel query execution