Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Promoting a standby during base backup (was Re: Switching timeline over streaming replication)
Date: 2012-10-08 12:17:58
Message-ID: CA+U5nMKE6cS9tNz6GOTX5x6w263tYAJ6BmPzW=BnXu0srt=DqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 October 2012 18:07, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Oct 4, 2012 at 4:59 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 03.10.2012 18:15, Amit Kapila wrote:
>>>
>>> On Tuesday, October 02, 2012 4:21 PM Heikki Linnakangas wrote:
>>>>
>>>> Hmm, should a base backup be aborted when the standby is promoted? Does
>>>> the promotion render the backup corrupt?
>>>
>>>
>>> I think currently it does so. Pls refer
>>> 1.
>>> do_pg_stop_backup(char *labelfile, bool waitforarchive)
>>> {
>>> ..
>>> if (strcmp(backupfrom, "standby") == 0&& !backup_started_in_recovery)
>>> ereport(ERROR,
>>>
>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> errmsg("the standby was promoted during
>>> online backup"),
>>> errhint("This means that the backup
>>> being
>>> taken is corrupt "
>>> "and should not be used.
>>> "
>>> "Try taking another
>>> online
>>> backup.")));
>>> ..
>>>
>>> }
>>
>>
>> Okay. I think that check in do_pg_stop_backup() actually already ensures
>> that you don't end up with a corrupt backup, even if the standby is promoted
>> while a backup is being taken. Admittedly it would be nicer to abort it
>> immediately rather than error out at the end.
>>
>> But I wonder why promoting a standby renders the backup invalid in the first
>> place? Fujii, Simon, can you explain that?
>
> Simon had the same question and I answered it before.
>
> http://archives.postgresql.org/message-id/CAHGQGwFU04oO8YL5SUcdjVq3BRNi7WtfzTy9wA2kXtZNHicTeA@mail.gmail.com
> ---------------------------------------
>> You say
>> "If the standby is promoted to the master during online backup, the
>> backup fails."
>> but no explanation of why?
>>
>> I could work those things out, but I don't want to have to, plus we
>> may disagree if I did.
>
> If the backup succeeds in that case, when we start an archive recovery from that
> backup, the recovery needs to cross between two timelines. Which means that
> we need to set recovery_target_timeline before starting recovery. Whether
> recovery_target_timeline needs to be set or not depends on whether the standby
> was promoted during taking the backup. Leaving such a decision to a user seems
> fragile.

I accepted your answer before, but I think it should be challenged
now. This is definitely a time when you really want that backup, so
invalidating it for such a weak reason is not useful, even if I
understand your original thought.

Something that has concerned me is that we don't have an explicit
"timeline change record". We *say* we do that at shutdown checkpoints,
but that is recorded in the new timeline. So we have the strange
situation of changing timeline at two separate places.

When we change timeline we really should generate one last WAL on the
old timeline that marks an explicit change of timeline and a single
exact moment when the timeline change takes place. With PITR we are
unable to do that, because any timeline can fork at any point. With
smooth switchover we have a special case that is not "anything goes"
and there is a good case for not incrementing the timeline at all.

This is still a half-formed thought, but at least you should know I'm
in the debate.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-10-08 12:38:47 Re: [repost] Help me develop new commit_delay advice
Previous Message Noah Misch 2012-10-08 12:11:28 Re: Visual Studio 2012 RC