Re: Proposal for changes to recovery.conf API

From: Andres Freund <andres(at)anarazel(dot)de>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for changes to recovery.conf API
Date: 2016-11-12 16:09:49
Message-ID: 20161112160949.xso4is7r5n2t25vq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2016-11-01 05:14:58 +0530, Abhijit Menon-Sen wrote:
> Subject: Convert recovery.conf settings to GUCs

I really want to get this in, we've been back and forth about this for
far too long.

> <para>
>- Create a recovery command file <filename>recovery.conf</> in the cluster
>- data directory (see <xref linkend="recovery-config">). You might
>+ Set up recovery parameters in <filename>postgresql.conf</> (see
>+ <xref linkend="runtime-config-wal-archive-recovery"> and
>+ <xref linkend="runtime-config-wal-recovery-target">).
>+ </para>
>+ </listitem>
>+ <listitem>
>+ <para>
>+ Create a recovery trigger file <filename>recovery.trigger</>
>+ in the cluster data directory. You might
> also want to temporarily modify <filename>pg_hba.conf</> to prevent
> ordinary users from connecting until you are sure the recovery was successful.
> </para>

I think this means we need to rephrase a number of docs and code
references to trigger files, because they'll currently often refer to
the promotion trigger, no?

> diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
> index 7a16751..15ce784 100644
> --- a/src/backend/access/transam/recovery.conf.sample
> +++ b/src/backend/access/transam/recovery.conf.sample
> @@ -2,23 +2,14 @@
> # PostgreSQL recovery config file
> # -------------------------------
> #
> -# Edit this file to provide the parameters that PostgreSQL needs to
> -# perform an archive recovery of a database, or to act as a replication
> -# standby.
> +# PostgreSQL 10.0 or later, recovery.conf is no longer used. Instead,
> +# specify all recovery parameters in postgresql.conf and create
> +# recovery.trigger to enter archive recovery or standby mode.
> #
> -# If "recovery.conf" is present in the PostgreSQL data directory, it is
> -# read on postmaster startup. After successful recovery, it is renamed
> -# to "recovery.done" to ensure that we do not accidentally re-enter
> -# archive recovery or standby mode.
> +# If you must use recovery.conf, specify "include directives" in
> +# postgresql.conf like this:
> #
> -# This file consists of lines of the form:
> -#
> -# name = value
> -#
> -# Comments are introduced with '#'.
> -#
> -# The complete list of option names and allowed values can be found
> -# in the PostgreSQL documentation.
> +# include 'recovery.conf'

This should probably warn about the differences in "trigger" behaviour
of recovery.conf being present.

> #---------------------------------------------------------------------------
> # ARCHIVE RECOVERY PARAMETERS
> @@ -131,14 +122,6 @@
> #
> #primary_slot_name = ''

I don't think it makes much sense to still list all this?

> +bool standby_mode = false;

I'm very tempted to rename this during the move to GUCs.

> +char *primary_conninfo = NULL;
> +char *primary_slot_name = NULL;

Slightly less so, but still tempted to also rename these. They're not
actually necessarily pointing towards a primary, and namespace-wise
they're not grouped with recovery_*, which has become more important now
that recovery.conf isn't a separate namespace anymore.

> -static int recovery_min_apply_delay = 0;
> static TimestampTz recoveryDelayUntilTime;
>
> +static TimestampTz recoveryDelayUntilTime;
> +

Isn't this a repeated definition?

> /*
> * During normal operation, the only timeline we care about is ThisTimeLineID.
> * During recovery, however, things are more complicated. To simplify life
> @@ -577,10 +578,10 @@ typedef struct XLogCtlData
> TimeLineID PrevTimeLineID;
>
> /*
> - * archiveCleanupCommand is read from recovery.conf but needs to be in
> - * shared memory so that the checkpointer process can access it.
> + * archive_cleanup_command is read from recovery.conf but needs to
> + * be in shared memory so that the checkpointer process can access it.
> */

References recovery.conf. It'd be a good idea to grep for all remaining
references to recovery.conf.

> +static bool
> +check_recovery_target(char **newval, void **extra, GucSource source)
> +{
> + RecoveryTargetType *myextra;
> + RecoveryTargetType rt = RECOVERY_TARGET_UNSET;
> +
> + if (strcmp(*newval, "xid") == 0)
> + rt = RECOVERY_TARGET_XID;
> + else if (strcmp(*newval, "time") == 0)
> + rt = RECOVERY_TARGET_TIME;
> + else if (strcmp(*newval, "name") == 0)
> + rt = RECOVERY_TARGET_NAME;
> + else if (strcmp(*newval, "lsn") == 0)
> + rt = RECOVERY_TARGET_LSN;
> + else if (strcmp(*newval, "immediate") == 0)
> + rt = RECOVERY_TARGET_IMMEDIATE;
> + else if (strcmp(*newval, "") != 0)
> + {
> + GUC_check_errdetail("recovery_target is not valid: \"%s\"", *newval);
> + return false;
> + }
> +
> + myextra = (RecoveryTargetType *) guc_malloc(ERROR, sizeof(RecoveryTargetType));
> + *myextra = rt;
> + *extra = (void *) myextra;
> +
> + return true;
> +}

Shouldn't this instead be an enum type GUC?

> +static bool
> +check_recovery_target_time(char **newval, void **extra, GucSource source)
> +{
> + TimestampTz time;
> + TimestampTz *myextra;
> + MemoryContext oldcontext = CurrentMemoryContext;
> +
> + PG_TRY();
> + {
> + time = (strcmp(*newval, "") == 0) ?
> + 0 :
> + DatumGetTimestampTz(DirectFunctionCall3(timestamptz_in,
> + CStringGetDatum(*newval),
> + ObjectIdGetDatum(InvalidOid),
> + Int32GetDatum(-1)));
> + }
> + PG_CATCH();
> + {
> + ErrorData *edata;
> +
> + /* Save error info */
> + MemoryContextSwitchTo(oldcontext);
> + edata = CopyErrorData();
> + FlushErrorState();
> +
> + /* Pass the error message */
> + GUC_check_errdetail("%s", edata->message);
> + FreeErrorData(edata);
> + return false;
> + }
> + PG_END_TRY();

Hm, I'm not happy to do that kind of thing. What if there's ever any
database interaction added to timestamptz_in?

It's also problematic because the parsing of timestamps depends on the
timezone GUC - which might not yet have taken effect...

> +static bool
> +check_recovery_target_lsn(char **newval, void **extra, GucSource source)
> +{
> + XLogRecPtr lsn;
> + XLogRecPtr *myextra;
> + MemoryContext oldcontext = CurrentMemoryContext;
> +
> + PG_TRY();
> + {
> + lsn = (strcmp(*newval, "") == 0) ?
> + InvalidXLogRecPtr :
> + DatumGetLSN(DirectFunctionCall3(pg_lsn_in,
> + CStringGetDatum(*newval),
> + ObjectIdGetDatum(InvalidOid),
> + Int32GetDatum(-1)));
> + }
> + PG_CATCH();
> + {
> + ErrorData *edata;
> +
> + /* Save error info */
> + MemoryContextSwitchTo(oldcontext);
> + edata = CopyErrorData();
> + FlushErrorState();
> +
> + /* Pass the error message */
> + GUC_check_errdetail("%s", edata->message);
> + FreeErrorData(edata);
> + return false;
> + }
> + PG_END_TRY();

That seems like a pretty pointless use of pg_lsn_in, given the problems
that PG_CATCHing an error without rethrowing brings.

> +static bool
> +check_recovery_target_action(char **newval, void **extra, GucSource source)
> +{
> + RecoveryTargetAction rta = RECOVERY_TARGET_ACTION_PAUSE;
> + RecoveryTargetAction *myextra;
> +
> + if (strcmp(*newval, "pause") == 0)
> + rta = RECOVERY_TARGET_ACTION_PAUSE;
> + else if (strcmp(*newval, "promote") == 0)
> + rta = RECOVERY_TARGET_ACTION_PROMOTE;
> + else if (strcmp(*newval, "shutdown") == 0)
> + rta = RECOVERY_TARGET_ACTION_SHUTDOWN;
> + else if (strcmp(*newval, "") != 0)
> + {
> + GUC_check_errdetail("recovery_target_action is not valid: \"%s\"", *newval);
> + return false;
> + }
> +
> + myextra = (RecoveryTargetAction *) guc_malloc(ERROR, sizeof(RecoveryTargetAction));
> + *myextra = rta;
> + *extra = (void *) myextra;
> +
> + return true;
> +}

Should be an enum.

> +static bool
> +check_primary_slot_name(char **newval, void **extra, GucSource source)
> +{
> + if (strcmp(*newval,"") != 0 &&
> + !ReplicationSlotValidateName(*newval, WARNING))
> + {
> + GUC_check_errdetail("primary_slot_name is not valid: \"%s\"", *newval);
> + return false;
> + }
> +
> + return true;
> +}

ReplicationSlotValidateName will emit WARNINGs this way - hm. That'll
often loose detail about what the problem actually is, e.g. at server
startup.

> +#standby_mode = off # "on" starts the server as a standby
> + # (change requires restart)
> +#primary_conninfo = '' # connection string to connect to the master
> +#trigger_file = '' # trigger file to promote the standby

Too general a name imo.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-12 16:30:42 Re: Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.
Previous Message Andres Freund 2016-11-12 15:38:30 Re: move collation import to backend