Re: Function to promote standby servers

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function to promote standby servers
Date: 2018-10-15 14:16:02
Message-ID: 183e19f71ae38f7447c2f63317f0108add6c6259.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Michael Paquier wrote:
> > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > index 7375a78ffc..3a1f49e83a 100644
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
> > /* File path names (all relative to $PGDATA) */
> > #define RECOVERY_COMMAND_FILE "recovery.conf"
> > #define RECOVERY_COMMAND_DONE "recovery.done"
> > -#define PROMOTE_SIGNAL_FILE "promote"
> > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
>
> Perhaps we could just move the whole set to xlog.h.

Done.

> > +Datum
> > +pg_promote(PG_FUNCTION_ARGS)
> > +{
> > + FILE *promote_file;
> > +
> > + if (!superuser())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > + errmsg("must be superuser to promote standby servers")));
>
> Let's remove this restriction at code level, and instead use
> function-level ACLs so as it is possible to allow non-superusers to
> trigger a promotion as well, say a dedicated role. We could add a
> system role for this purpose, but I am not sure that it is worth the
> addition yet.

Agreed, done.

> > + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> > + {
> > + sleep 1;
> > + }
> > + return;
>
> This could go on infinitely, locking down buildfarm machines silently if
> a commit is not careful. What I would suggest to do instead is to not
> touch PostgresNode.pm at all, and to just modify one test to trigger
> promotion with the SQL function. Then from the test, you should add a
> comment that triggerring promotion from SQL is wanted instead of the
> internal routine, and then please add a call to poll_query_until() in
> the same way as what 6deb52b2 has removed.

I have modified recovery/t/004_timeline_switch.pl and recovery/t/009_twophase.pl
accordingly: one calls the function with "wait => true", the other
uses "wait => false" and waits as you suggested.

> As of now, this function returns true if promotion has been triggered,
> and false if not. However it seems to me that we should have something
> more consistent with "pg_ctl promote", so there are actually three
> possible states:
> 1) Node is in recovery, with promotion triggered. pg_promote() returns
> true in case of success in creating the trigger file.
> 2) Node is in recovery, with promotion triggered. pg_promote() returns
> false in case of failure in creating the trigger file.
> 3) Node is not in recovery, where I think a call to pg_promote should
> trigger an error. This is not handled in the patch.

So far I had returned "false" in the last case, but I am fine with
throwing an error in that case. Done.

> An other thing which has value is to implement a "wait" mode for this
> feature, or actually a nowait mode. You could simply do what pg_ctl
> does with its logic in get_control_dbstate() by looking at the control
> file state. I think that waiting for the promotion to happen should be
> the default.

I have implemented that.

> At the end this patch needs a bit more work.

Yes, it did. Thanks for the thorough review!

Yours,
Laurenz Albe

Attachment Content-Type Size
0001-Add-pg_promote-to-promote-standby-servers-V4.patch text/x-patch 10.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-10-15 14:17:39 Re: Requesting advanced Group By support
Previous Message Fabien COELHO 2018-10-15 13:09:38 Re: [HACKERS] pgbench - allow to store select results into variables