Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-06-05 14:06:47
Message-ID: CAHGQGwEKhv9ucj6CveQnkyYMQUeeHoJ=9BgY0ba9fFM3b37RRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jun 5, 2015 at 3:01 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Wed, Jun 3, 2015 at 1:04 AM, Fujii Masao wrote:
>>
>> On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
>> >> Some testing shows us that in some cases, when pg_ctl promote is called
>> >> multiple
>> >> times, a promote file is left in the PGDATA directory, even though the
>> >> cluster
>> >> has been succesfully promoted and is accepting read/write queries.
>> >
>> > This is not surprising, pg_ctl bases its analysis that a node needs to
>> > be promoted if recovery.conf exists or not, and there is an interval
>> > of time between which recovery.conf is removed and the promotion is
>> > actually triggered, so you can create a promote file even after even
>> > sending SIGUSR1 to the standby's postmaster
>> >
>> >> We will try to workaround this issue by ensuring we do not send
>> >> multiple
>> >> promote request using pg_ctl to the same cluster.
>> >
>> > Well, we could for example have the server switch promote to
>> > promote_done in CheckForStandbyTrigger() and then unlink it when
>> > recovery.conf is switched to .done. Opinions are welcome on the
>> > matter.
>>
>> Or we can just always remove the signal file at the end of recovery.
>> That filename switch seems unnecessary.
>
>
> Well, by doing so, in the event of a crash during recovery the promote
> signal file would be present in PGDATA, and this would enforce a promotion
> at the next startup of the node. I don't think that this is a good idea. In
> the case of a promoted node crash a user may want to look at his node back
> in a recovery state.

You meant the case of crash which occurs before CheckForStandbyTrigger()
removes the signal file after pg_ctl promote is executed? If yes, even if
we rename the file to the intermediate one, the signal file would remain.

If we want to address the above corner case, we can additionally remove
the file always at the beginning of recovery. This idea can completely avoid
an unexpected promotion by the surviving signal file.

>
> Also, this intermediate promote file, let's say promote.detected, would be
> useful for external tools to let them know that the promotion has been
> acknoledged (you can already know it if your tool knows that a promote has
> been triggered, that promote has been removed by the server and if
> recovery.conf is still present). That's not something you would want on back
> branches btw as this changes how promotion bevahes seen from an external
> point of view. But that would be a patch simple enough (got a WIP for people
> wondering).
>
> An open question would be what to do with pg_ctl promote if a promote file
> already exists. I think that we should ignore the creation of the promote
> file but still kick the signal SIGUSR1.
>
>>
>> In addition to that change, we should make pg_basebackup skip
>> the signal file?
>
>
> Well, yes, and it we would be just fine for the case reported by Feike to
> just ignore promote and fallback_promote in a base backup, as the problem
> reported was about a standby that contained the signal promote file after
> pg_basebackup. And I think that we would be fine by doing that as well in
> the back-branches. trigger_file is not exposed out of xlog.c in the startup
> process, but I can live with the fact that it is not ignored.
>
> In short, I guess that the patch attached would be fine.
> Opinions?

I have no strong objection to that change, but it seems half-baked.
That is, that idea doesn't address the case where a base backup is
taken by other than pg_basebackup at all.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message heliotec2008 2015-06-05 18:46:55 BUG #13403: Erro na instalção
Previous Message Feike Steenbergen 2015-06-05 10:58:50 Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master