Re: Turning recovery.conf into GUCs

From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Turning recovery.conf into GUCs
Date: 2014-11-21 17:35:15
Message-ID: 87389ctq24.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Here's an attempt to revive this patch.

It is rebased onto the latest master and also includes handling and
documentation of newly added recovery.conf parameters such as
primary_slot_name, recovery_min_apply_delay and
recovery_target='immediate'.

The following feedback had been addressed:

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> >
>> > * --write-standby-enable seems to loose quite some functionality in
>> > comparison to --write-recovery-conf since it doesn't seem to set
>> > primary_conninfo, standby anymore.
>>
>> we can add code that looks for postgresql.conf in $PGDATA but if
>> postgresql.conf is not there (like the case in debian, there is not
>> much we can do about it) or we can write a file ready to be included
>> in postgresql.conf, any sugestion?
>
> People might not like me for the suggestion, but I think we should
> simply always include a 'recovery.conf' in $PGDATA
> unconditionally. That'd make this easier.
> Alternatively we could pass a filename to --write-recovery-conf.

Well, the latest version of this patch fails to start when it sees
'recovery.conf' in PGDATA:

FATAL: "recovery.conf" is not supported anymore as a recovery method
DETAIL: Refer to appropriate documentation about migration methods

I've missed all the discussion behind this decision and after reading
the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like
someone more knowledgeable to speak up on the status of this.

Do we want to keep this behavior of the patch?

>> > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive
>> > function name.
>>
>> I left it as CheckStartingAsStandby() but i still have a problem of
>> this not being completely clear. this function is useful for standby
>> or pitr.

There's not much left for this function in the current patch version, so
maybe we should just move it to StartupXLOG (it's not called from
anywhere else either way).

>> which leads me to the other problem i have: the recovery trigger file,
>> i have left it as standby.enabled but i still don't like it.
>
>> recovery.trigger (Andres objected on this name)
>> forced_recovery.trigger
>> user_forced_recovery.trigger
>
> stay_in_recovery.trigger? That'd be pretty clear for anybody involved in
> pg, but otherwise...

The docs use the term "continuous recovery".

Either way, from the code it is clear that we only stay in recovery if
standby_mode is directly turned on. This makes the whole check for a
specially named file unnecessary, IMO: we should just check the value of
standby_mode (which is off by default).

By the way, is there any use in setting standby_mode=on and any of the
recovery_target* GUCs at the same time?

I think it can only play together if you set the target farther than the
latest point you've got in the archive locally. So that's sort of
"Point-in-Future-Recovery". Does that make any sense at all?

>> > * the description of archive_cleanup_command seems wrong to me.
>>
>> why? it seems to be the same that was in recovery.conf. where did you
>> see the description you're complaining at?
>
> I dislike the description in guc.c
>
>> + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
>> + gettext_noop("Sets the shell command that will be executed at every restartpoint."),
>> + NULL
>> + },
>> + &archive_cleanup_command,
>
> previously it was:
>
>> -# specifies an optional shell command to execute at every restartpoint.
>> -# This can be useful for cleaning up the archive of a standby server.

Expanded the GUC desc.

>> > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being
>> > really strangely formatted (multiline :? inside a function?) it
>> > doesn't a) seem to be correct to ignore potential memory allocation
>> > errors by just switching back into the context that just errored out,
>> > and continue to work there b) forgo cleanup by just continuing as if
>> > nothing happened. That's unlikely to be acceptable.
>>
>> the code that read recovery.conf didn't has that, so i just removed it
>
> Well, that's not necessarily correct. recovery.conf was only read during
> startup, while this is read on SIGHUP.
>
[copied from the bottom, related]
>
> I don't think that's correct. Afaics it will cause the postmaster to
> crash on a SIGHUP with invalid data. I think you're unfortunately going
> to have to copy a fair bit of timestamptz_in() and even
> DateTimeParseError(), replacing the ereport()s by the guc error
> reporting.

The use of PG_TRY/CATCH does protect from FATALs in SIGHUP indeed.
Using CopyErrorData() we can also fetch the actual error message from
timestamptz_in, though I wonder we really have to make a full copy.

>> > * Why do you include xlog_internal.h in guc.c and not xlog.h?
>
>> we actually need both but including xlog_internal.h also includes xlog.h
>> i added xlog.h and if someone things is enough only putting
>> xlog_internal.h let me know
>
> What's required from xlog_internal.h?

Looks like this was addressed before me.

>> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
>> index b53ae87..54f6a0d 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -64,11 +64,12 @@
>> extern uint32 bootstrap_data_checksum_version;
>>
>> /* File path names (all relative to $PGDATA) */
>> -#define RECOVERY_COMMAND_FILE "recovery.conf"
>> -#define RECOVERY_COMMAND_DONE "recovery.done"
>> +#define RECOVERY_ENABLE_FILE "standby.enabled"
>
> Imo the file and variable names should stay coherent.

Yes, once we settle on the name (and if we really need that extra
trigger file.)

>> +/* recovery.conf is not supported anymore */
>> +#define RECOVERY_COMMAND_FILE "recovery.conf"
>
>> +bool StandbyModeRequested = false;
>> +static TimestampTz recoveryDelayUntilTime;
>
> This imo should be lowercase now, the majority of GUC variables are.

This is not a GUC variable, though it's calculated based on a GUC
recovery_min_apply_delay.

>> +/* are we currently in standby mode? */
>> +bool StandbyMode = false;
>
> Why did you move this?

It was easy to move it back though.

>> - if (rtliGiven)
>> + if (strcmp(recovery_target_timeline_string, "") != 0)
>> {
>
> Why not have the convention that NULL indicates a unset target_timeline
> like you use for other GUCs? Mixing things like this is confusing.
>
> Why is recovery_target_timeline stored as a string? Because it's a
> unsigned int? If so, you should have an assign hook setting up a)
> rtliGiven, b) properly typed variable.

Yes, I believe setting these to NULL by default makes a lot more sense.
Then one can check if there was a non-default setting by checking the
*_string variable, which is not possible with int or TimestampTz.

>> - if (rtli)
>> + if (recovery_target_timeline)
>> {
>> /* Timeline 1 does not have a history file, all else should */
>> - if (rtli != 1 && !existsTimeLineHistory(rtli))
>> + if (recovery_target_timeline != 1 &&
>> + !existsTimeLineHistory(recovery_target_timeline))
>> ereport(FATAL,
>> (errmsg("recovery target timeline %u does not exist",
>> - rtli)));
>> - recoveryTargetTLI = rtli;
>> + recovery_target_timeline)));
>> + recoveryTargetTLI = recovery_target_timeline;
>> recoveryTargetIsLatest = false;
>
> So, now we have a recoveryTargetTLI and recovery_target_timeline
> variable? Really? Why do we need recoveryTargetTLI at all now?

Looks like we still do need both of them. The initial value of
recoveryTargetTLI is derived from pg_control, but later it can be
overriden by this setting.

However, if the recovery_target_timeline was "latest" we need to find
the latest TLI, based on the initial value from pg_control. But we
can't set final recoveryTargetTLI value in the GUC assign hook, because
we might need to fetch some history files first.

>> +static void
>> +assign_recovery_target_time(const char *newval, void *extra)
>> +{
>> + recovery_target_time = *((TimestampTz *) extra);
>> +
>> + if (recovery_target_xid != InvalidTransactionId)
>> + recovery_target = RECOVERY_TARGET_XID;
>> + else if (recovery_target_name[0])
>> + recovery_target = RECOVERY_TARGET_NAME;
>> + else if (recovery_target_time != 0)
>> + recovery_target = RECOVERY_TARGET_TIME;
>> + else
>> + recovery_target = RECOVERY_TARGET_UNSET;
>> +}
>> +
>
> I don't think it's correct to do such hangups in the assign hook - you
> have no ideas in which order they will be called and such. Imo that
> should happen at startup, like we also do it for other interdependent
> variables like wal_buffers.

Yeah, that looked weird. Moved to StartupXLOG().

I disliked the strtoul handling in the earlier version of the patch,
especially given that with the base=0 it can parse 0x-prefixed hex
strings. I would rather error out on non-hex digit instead of stopping
and calling it OK. This change is included in the new version.

Should we really allow specifying negative values for XID/timeline?
Right now it will happily consume "-1" for recovery_target_xid and
complain if it's out of range, like this:

LOG: starting point-in-time recovery to XID 4294967295
LOG: invalid primary checkpoint record
LOG: invalid secondary checkpoint record

Allowing negative values makes even less sense for timelines, IMO.

--
Alex

Attachment Content-Type Size
recovery_guc_v5.3.patch text/x-diff 120.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-11-21 17:55:01 Re: Turning recovery.conf into GUCs
Previous Message Tom Lane 2014-11-21 17:32:35 Re: psql \sf doesn't show it's SQL when ECHO_HIDDEN is on