Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-06-05 06:01:32
Message-ID: CAB7nPqQ-KwDRroXoiz=V9zW+Or7ns90z8gD4LU1tpsNScb2U7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jun 3, 2015 at 1:04 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>wrote:

> On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
> >> Some testing shows us that in some cases, when pg_ctl promote is called
> >> multiple
> >> times, a promote file is left in the PGDATA directory, even though the
> >> cluster
> >> has been succesfully promoted and is accepting read/write queries.
> >
> > This is not surprising, pg_ctl bases its analysis that a node needs to
> > be promoted if recovery.conf exists or not, and there is an interval
> > of time between which recovery.conf is removed and the promotion is
> > actually triggered, so you can create a promote file even after even
> > sending SIGUSR1 to the standby's postmaster
> >
> >> We will try to workaround this issue by ensuring we do not send multiple
> >> promote request using pg_ctl to the same cluster.
> >
> > Well, we could for example have the server switch promote to
> > promote_done in CheckForStandbyTrigger() and then unlink it when
> > recovery.conf is switched to .done. Opinions are welcome on the
> > matter.
>
> Or we can just always remove the signal file at the end of recovery.
> That filename switch seems unnecessary.
>

Well, by doing so, in the event of a crash during recovery the promote
signal file would be present in PGDATA, and this would enforce a promotion
at the next startup of the node. I don't think that this is a good idea. In
the case of a promoted node crash a user may want to look at his node back
in a recovery state.

Also, this intermediate promote file, let's say promote.detected, would be
useful for external tools to let them know that the promotion has been
acknoledged (you can already know it if your tool knows that a promote has
been triggered, that promote has been removed by the server and if
recovery.conf is still present). That's not something you would want on
back branches btw as this changes how promotion bevahes seen from an
external point of view. But that would be a patch simple enough (got a WIP
for people wondering).

An open question would be what to do with pg_ctl promote if a promote file
already exists. I think that we should ignore the creation of the promote
file but still kick the signal SIGUSR1.

>
> In addition to that change, we should make pg_basebackup skip
> the signal file?
>

Well, yes, and it we would be just fine for the case reported by Feike to
just ignore promote and fallback_promote in a base backup, as the problem
reported was about a standby that contained the signal promote file after
pg_basebackup. And I think that we would be fine by doing that as well in
the back-branches. trigger_file is not exposed out of xlog.c in the startup
process, but I can live with the fact that it is not ignored.

In short, I guess that the patch attached would be fine.
Opinions?
--
Michael

Attachment Content-Type Size
20150605_ignore_promote_file.patch text/x-patch 1.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message rajivmehta0201 2015-06-05 06:54:48 BUG #13400: Unable to connect postgresql using remote machine
Previous Message Michael Paquier 2015-06-05 01:53:34 Re: Incorrect processing of CREATE TRANSFORM with DDL deparding