Re: Teaching pg_receivexlog to follow timeline switches

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-16 17:06:48
Message-ID: CAHGQGwGN9QMLJ8Xb7G5v77OsohGhQNuiB-pmceBW5JEUTxe+-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 1:08 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 15.01.2013 20:22, Fujii Masao wrote:
>>
>> On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>>
>>> Now that a standby server can follow timeline switches through streaming
>>> replication, we should do teach pg_receivexlog to do the same. Patch
>>> attached.
>>>
>>> I made one change to the way START_STREAMING command works, to better
>>> support this. When a standby server reaches the timeline it's streaming
>>> from
>>> the master, it stops streaming, fetches any missing timeline history
>>> files,
>>> and parses the history file of the latest timeline to figure out where to
>>> continue. However, I don't want to parse timeline history files in
>>> pg_receivexlog. Better to keep it simple. So instead, I modified the
>>> server-side code for START_STREAMING to return the next timeline's ID at
>>> the
>>> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to
>>> return
>>> not only the start XLogRecPtr, but also the corresponding timeline ID.
>>> Otherwise we might try to start streaming from wrong timeline if you
>>> issue a
>>> BASE_BACKUP at the same moment the server switches to a new timeline.
>>>
>>> When pg_receivexlog switches timeline, what to do with the partial file
>>> on
>>> the old timeline? When the timeline changes in the middle of a WAL
>>> segment,
>>> the segment old the old timeline is only half-filled. For example, when
>>> timeline changes from 1 to 2, you'll have this in pg_xlog:
>>>
>>> 000000010000000000000006
>>> 000000010000000000000007
>>> 000000010000000000000008
>>> 000000020000000000000008
>>> 00000002.history
>>>
>>> The segment 000000010000000000000008 is only half-filled, as the timeline
>>> changed in the middle of that segment. The beginning portion of that file
>>> is
>>> duplicated in 000000020000000000000008, with the timeline-changing
>>> checkpoint record right after the duplicated portion.
>>>
>>> When we stream that with pg_receivexlog, and hit the timeline switch,
>>> we'll
>>> have this situation in the client:
>>>
>>> 000000010000000000000006
>>> 000000010000000000000007
>>> 000000010000000000000008.partial
>>>
>>> What to do with the partial file? One option is to rename it to
>>> 000000010000000000000008. However, if you then kill pg_receivexlog before
>>> it
>>> has finished streaming a full segment from the new timeline, on restart
>>> it
>>> will try to begin streaming WAL segment 000000010000000000000009, because
>>> it
>>> sees that segment 000000010000000000000008 is already completed. That'd
>>> be
>>> wrong.
>>
>>
>> Can't we rename .partial file safely after we receive a full segment
>> of the WAL file
>> with new timeline and the same logid/segmentid?
>
>
> I'd prefer to leave the .partial suffix in place, as the segment really
> isn't complete. It doesn't make a difference when you recover to the latest
> timeline, but if you have a more complicated scenario with multiple
> timelines that are still "alive", ie. there's a server still actively
> generating WAL on that timeline, you'll easily get confused.
>
> As an example, imagine that you have a master server, and one standby. You
> maintain a WAL archive for backup purposes with pg_receivexlog, connected to
> the standby. Now, for some reason, you get a split-brain situation and the
> standby server is promoted with new timeline 2, while the real master is
> still running. The DBA notices the problem, and kills the standby and
> pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
> pg_receivexlog's target directory, and re-points pg_recevexlog to the master
> while he re-builds the standby server from backup. At that point,
> pg_receivexlog will start streaming from the end of the zero-padded segment,
> not knowing that it was partial, and you have a hole in the archived WAL
> stream. Oops.
>
> The DBA could avoid that by also removing the last WAL segment on timeline
> 1, the one that was partial. But it's really not obvious that there's
> anything wrong with that segment. Keeping the .partial suffix makes it
> clear.

Thanks for elaborating the reason why .partial suffix should be kept.
I agree that keeping the .partial suffix would be safer.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2013-01-16 17:07:29 Re: Materialized views WIP patch
Previous Message Kevin Grittner 2013-01-16 16:48:17 Re: Materialized views WIP patch