Re: allow specifying action when standby encounters incompatible parameter settings

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: nathandbossart(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: allow specifying action when standby encounters incompatible parameter settings
Date: 2022-04-14 02:36:11
Message-ID: 20220414.113611.1900283723994151474.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 13 Apr 2022 14:35:21 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> Hi hackers,
>
> As of 15251c0, when a standby encounters an incompatible parameter change,
> it pauses replay so that read traffic can continue while the administrator
> fixes the parameters. Once the server is restarted, replay can continue.
> Before this change, such incompatible parameter changes caused the standby
> to immediately shut down.
>
> I noticed that there was some suggestion in the thread associated with
> 15251c0 [0] for making this behavior configurable, but there didn't seem to
> be much interest at the time. I am interested in allowing administrators
> to specify the behavior before 15251c0 (i.e., immediately shut down the
> standby when an incompatible parameter change is detected). The use-case I
> have in mind is when an administrator has automation in place for adjusting
> these parameters and would like to avoid stopping replay any longer than
> necessary. FWIW this is what we do in RDS.
>
> I've attached a patch that adds a new GUC where users can specify the
> action to take when an incompatible parameter change is detected on a
> standby. For now, there are just two options: 'pause' and 'shutdown'.
> This new GUC is largely modeled after recovery_target_action.

The overall direction of going to shutdown without needing user
interaction seems fine. I think the same can be done by
timeout. That is, we provide a GUC named like
insufficient_standby_setting_shutdown_timeout (mmm. too long..), then
recovery sits down for the duration then shuts down. -1 means the
current behavior, 0 means what this patch is going to
introduce. However I don't see a concrete use case of the timeout.

> I initially set out to see if it was possible to automatically adjust these
> parameters on a standby, but that is considerably more difficult. It isn't
> enough to just hook into the restart_after_crash functionality since it
> doesn't go back far enough in the postmaster logic. IIUC we'd need to
> reload preloaded libraries (which there is presently no support for),
> recalculate MaxBackends, etc. Another option I considered was to

Sure.

> automatically adjust the parameters during startup so that you just need to
> restart the server. However, we need to know for sure that the server is
> going to be a hot standby, and I don't believe we have that information
> where such GUC changes would need to occur (I could be wrong about this).

Conldn't we use AlterSystemSetConfigFile for this purpose in
CheckRequiredParameterValues?

> Anyway, for now I'm just proposing the modest change described above, but
> I'd welcome any discussion about improving matters further in this area.
>
> [0] https://postgr.es/m/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b%402ndquadrant.com

Is the reason for the enum the extensibility to add a new choice like
"auto-adjust"?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-04-14 02:51:22 Re: Intermittent buildfarm failures on wrasse
Previous Message Tom Lane 2022-04-14 02:18:22 Re: Intermittent buildfarm failures on wrasse