Re: Function to promote standby servers

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function to promote standby servers
Date: 2018-10-09 03:17:50
Message-ID: 20181009031750.GC2137@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 08, 2018 at 07:36:50PM +0200, Laurenz Albe wrote:
> The attached patch has regression tests - I though it would be good
> to change some of the existing tests that run standby promotion to
> use the SQL function instead of pg_ctl.
>
> I have left the name though -- as far as I can tell, "promote" has
> no other meaning in PostgreSQL than standby promotion, and I believe
> it is only good to be more verbose if that avoids confusion.

I am fine with pg_promote.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 7375a78ffc..3a1f49e83a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
> /* File path names (all relative to $PGDATA) */
> #define RECOVERY_COMMAND_FILE "recovery.conf"
> #define RECOVERY_COMMAND_DONE "recovery.done"
> -#define PROMOTE_SIGNAL_FILE "promote"
> -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"

Perhaps we could just move the whole set to xlog.h.

> +Datum
> +pg_promote(PG_FUNCTION_ARGS)
> +{
> + FILE *promote_file;
> +
> + if (!superuser())
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser to promote standby servers")));

Let's remove this restriction at code level, and instead use
function-level ACLs so as it is possible to allow non-superusers to
trigger a promotion as well, say a dedicated role. We could add a
system role for this purpose, but I am not sure that it is worth the
addition yet.

> + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> + {
> + sleep 1;
> + }
> + return;

This could go on infinitely, locking down buildfarm machines silently if
a commit is not careful. What I would suggest to do instead is to not
touch PostgresNode.pm at all, and to just modify one test to trigger
promotion with the SQL function. Then from the test, you should add a
comment that triggerring promotion from SQL is wanted instead of the
internal routine, and then please add a call to poll_query_until() in
the same way as what 6deb52b2 has removed.

As of now, this function returns true if promotion has been triggered,
and false if not. However it seems to me that we should have something
more consistent with "pg_ctl promote", so there are actually three
possible states:
1) Node is in recovery, with promotion triggered. pg_promote() returns
true in case of success in creating the trigger file.
2) Node is in recovery, with promotion triggered. pg_promote() returns
false in case of failure in creating the trigger file.
3) Node is not in recovery, where I think a call to pg_promote should
trigger an error. This is not handled in the patch.

An other thing which has value is to implement a "wait" mode for this
feature, or actually a nowait mode. You could simply do what pg_ctl
does with its logic in get_control_dbstate() by looking at the control
file state. I think that waiting for the promotion to happen should be
the default.

At the end this patch needs a bit more work.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-09 03:23:11 Re: pgsql: Fix event triggers for partitioned tables
Previous Message Tom Lane 2018-10-09 02:49:50 Re: pread() and pwrite()