Re: Allow some recovery parameters to be changed with reload

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sk(at)zsrv(dot)org
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, a(dot)lubennikova(at)postgrespro(dot)ru, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow some recovery parameters to be changed with reload
Date: 2020-11-10 11:53:21
Message-ID: 20201110.205321.1681552377835477837.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Tue, 10 Nov 2020 14:13:17 +0300, Sergei Kornilov <sk(at)zsrv(dot)org> wrote in
> Hello
>
> > If someone changes restore_command to '' then reload while crash
> > recovery is running, the server stops for no valid reason.
>
> While *crash* recovery is running? It's possible only during Point-in-Time Recovery, no?

Even if PITR is commanded, crash recovery can run before starting
archive recovery if the server was not gracefully shut down.
Parameter reload can happen while crash recovery. And
validateRecoveryParameters() calls "ereport(FATAL" in that case.

> At the beginning of validateRecoveryParameters we check ArchiveRecoveryRequested, which can be set in two cases:

That does not prevent crash recovery from running.

> * if recovery.signal found - same check on recovery start. Otherwise it is possible to early end recovery because of empty restore_command. So we want to protect the user from such misconfiguration? I am fine if we decide that no additional handling is needed.
> * if standby.signal found - this FATAL is not reachable because StandbyModeRequested is also set.
>
> During crash recovery validateRecoveryParameters does nothing.

> > If restore_command is set to 'hoge' (literally:p, that is, anything
> > unexecutable) and send SIGHUP while archive recovery is running, the
> > server stops. I think we need to handle these cases more gracefully,
>
> I think we can not perform such check reliable. As in my example earlier:
>
> > restore_command = '. /etc/wal-g/WALG_AWS_ENV; wal-g wal-fetch "%f" "%p"'
>
> How do we find the commands first? For any shell? And even: we learned that the binary is unexecutable. But what to do next?

I don't suggest to check if the command actually works, I suggested to
avoid server stop even if the parameters failed to run after a
config-reload.

> > If someone changes restore_command by mistake to something executable
> > but fails to offer the specfied file even if it exists, the running
> > archive recovery finishes then switches timeline unexpectedly.
>
> Or executable file was just removed. Which is clearly a pilot error. Is this differs from changing restore_command?

I don't know. I just think that it is not proper that "ALTER SYSTEM" +
config-reload causes server stop.

> >>  I do not know the history of this fatal ereport. It looks like "must specify restore_command when standby mode is not enabled" check is only intended to protect the user from misconfiguration and the rest code will treat empty restore_command correctly, just like /bin/false. Did not notice anything around StandbyMode conditions.
> >
> > If restore_command is not changable after server-start, it would be
> > valid for startup to stop for inexecutable content for the variable
> > since there's no way to proceed recovery.
>
> Why not use local pg_wal? There may be already enough WAL.

Mmm. If the file to read is in pg_wal, restore_command won't be
executed in the first place?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2020-11-10 12:12:49 Re: -O switch
Previous Message Kyotaro Horiguchi 2020-11-10 11:35:57 Re: Autovacuum on partitioned table (autoanalyze)