Re: pg_rewind failure by file deletion in source server

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind failure by file deletion in source server
Date: 2015-07-17 03:28:19
Message-ID: CAB7nPqS8Q5+LpJAVSNsgy1y7kAv6Uf-fzCC9Ja=aD2Dmz9kFbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 06/29/2015 09:44 AM, Michael Paquier wrote:
>>>
>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:
>>>>
>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
>>>> it
>>>> would be enough to special-case pg_xlog for now.
>>>
>>>
>>> Well, sure, pg_rewind does not copy the soft links either way. Now it
>>> would be nice to have an option to be able to recreate the soft link
>>> of at least pg_xlog even if it can be scripted as well after a run.
>>
>>
>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
>> any non-trivial scenarios, just copying all the files from pg_xlog isn't
>> enough anyway, and you need to set up a recovery.conf after running
>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the
>> WAL. So you can argue that by not copying pg_xlog automatically, we're
>> actually doing a favour to the DBA, by forcing him to set up the
>> recovery.conf file correctly. Because if you just test simple scenarios
>> where not much time has passed between the failover and running pg_rewind,
>> it might be enough to just copy all the WAL currently in pg_xlog, but it
>> would not be enough if more time had passed and not all the required WAL is
>> present in pg_xlog anymore. And by not copying the WAL, we can avoid some
>> copying, as restore_command or streaming replication will only copy what's
>> needed, while pg_rewind would copy all WAL it can find the target's data
>> directory.
>>
>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog
>> option. It would be nice to also add an optional --xlog option to pg_rewind,
>> but with pg_rewind it's possible that all the required WAL isn't present in
>> the pg_xlog directory anymore, so you wouldn't always achieve the same
>> effect of making the backup self-contained.
>>
>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
>> in both the source and the target.
>
> If pg_xlog is simply ignored, some old WAL files may remain in target server.
> Don't these old files cause the subsequent startup of target server as new
> standby to fail? That is, it's the case where the WAL file with the same name
> but different content exist both in target and source. If that's harmfull,
> pg_rewind also should remove the files in pg_xlog of target server.

This would reduce usability. The rewound node will replay WAL from the
previous checkpoint where WAL forked up to the minimum recovery point
of source node where pg_rewind has been run. Hence if we remove
completely the contents of pg_xlog we'd lose a portion of the logs
that need to be replayed until timeline is switched on the rewound
node when recovering it (while streaming from the promoted standby,
whatever). I don't really see why recycled segments would be a
problem, as that's perhaps what you are referring to, but perhaps I am
missing something.

Attached is a rebased version of the previous patch to ignore the
contents of pg_xlog/ when rewinding.
--
Michael

Attachment Content-Type Size
20150717_pgrewind_ignore_xlog.patch binary/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-07-17 03:49:42 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore
Previous Message Amit Kapila 2015-07-17 03:22:54 Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file