Re: pg_rewind failure by file deletion in source server

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind failure by file deletion in source server
Date: 2015-08-01 19:01:28
Message-ID: 55BD1788.3090803@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/17/2015 06:28 AM, Michael Paquier wrote:
> On Wed, Jul 1, 2015 at 9:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Jul 1, 2015 at 2:21 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> On 06/29/2015 09:44 AM, Michael Paquier wrote:
>>>>
>>>> On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote:
>>>>>
>>>>> But we'll still need to handle the pg_xlog symlink case somehow. Perhaps
>>>>> it
>>>>> would be enough to special-case pg_xlog for now.
>>>>
>>>>
>>>> Well, sure, pg_rewind does not copy the soft links either way. Now it
>>>> would be nice to have an option to be able to recreate the soft link
>>>> of at least pg_xlog even if it can be scripted as well after a run.
>>>
>>> Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In
>>> any non-trivial scenarios, just copying all the files from pg_xlog isn't
>>> enough anyway, and you need to set up a recovery.conf after running
>>> pg_rewind that contains a restore_command or primary_conninfo, to fetch the
>>> WAL. So you can argue that by not copying pg_xlog automatically, we're
>>> actually doing a favour to the DBA, by forcing him to set up the
>>> recovery.conf file correctly. Because if you just test simple scenarios
>>> where not much time has passed between the failover and running pg_rewind,
>>> it might be enough to just copy all the WAL currently in pg_xlog, but it
>>> would not be enough if more time had passed and not all the required WAL is
>>> present in pg_xlog anymore. And by not copying the WAL, we can avoid some
>>> copying, as restore_command or streaming replication will only copy what's
>>> needed, while pg_rewind would copy all WAL it can find the target's data
>>> directory.
>>>
>>> pg_basebackup also doesn't include any WAL, unless you pass the --xlog
>>> option. It would be nice to also add an optional --xlog option to pg_rewind,
>>> but with pg_rewind it's possible that all the required WAL isn't present in
>>> the pg_xlog directory anymore, so you wouldn't always achieve the same
>>> effect of making the backup self-contained.
>>>
>>> So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory
>>> in both the source and the target.
>>
>> If pg_xlog is simply ignored, some old WAL files may remain in target server.
>> Don't these old files cause the subsequent startup of target server as new
>> standby to fail? That is, it's the case where the WAL file with the same name
>> but different content exist both in target and source. If that's harmfull,
>> pg_rewind also should remove the files in pg_xlog of target server.
>
> This would reduce usability. The rewound node will replay WAL from the
> previous checkpoint where WAL forked up to the minimum recovery point
> of source node where pg_rewind has been run. Hence if we remove
> completely the contents of pg_xlog we'd lose a portion of the logs
> that need to be replayed until timeline is switched on the rewound
> node when recovering it (while streaming from the promoted standby,
> whatever). I don't really see why recycled segments would be a
> problem, as that's perhaps what you are referring to, but perhaps I am
> missing something.

Hmm. My thinking was that you need to set up restore_command or
primary_conninfo anyway, to fetch the old WAL, so there's no need to
copy any WAL. But there's a problem with that: you might have WAL files
in the source server that haven't been archived yet, and you need them
to recover the rewound target node. That's OK for libpq mode, I think as
the server is still running and presumably and you can fetch the WAL
with streaming replication, but for copy-mode, that's not a good
assumption. You might be relying on a WAL archive, and the file might
not be archived yet.

Perhaps it's best if we copy all the WAL files from source in copy-mode,
but not in libpq mode. Regarding old WAL files in the target, it's
probably best to always leave them alone. They should do no harm, and as
a general principle it's best to avoid destroying evidence.

It'd be nice to get some fix for this for alpha2, so I'll commit a fix
to do that on Monday, unless we come to a different conclusion before that.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Piotr Stefaniak 2015-08-01 19:28:12 Null pointer passed as source to memcpy() in numeric.c:make_result() and numeric:set_var_from_var()
Previous Message Andreas Seltenreich 2015-08-01 18:40:29 Re: [sqlsmith] Failed assertion in joinrels.c