Re: Unexpected Standby Shutdown on sync_replication_slots change

From: Hugo DUBOIS <hdubois(at)scaleway(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected Standby Shutdown on sync_replication_slots change
Date: 2025-07-25 09:39:54
Message-ID: CAH0PTU_iU=qx83h_Ud5YWCCMHFTvuVbxzR8WiYft01rJns+U8g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for the quick patch. I've tested it on the REL_17_STABLE branch, and
it's working fine.

On a standby node, after dynamically setting the sync_replication_slots
parameter, I observed the following logs. The instance did not shut down,
which seems correct:

2025-07-25 09:26:34.613 UTC [4420] LOG: received SIGHUP, reloading
configuration files
2025-07-25 09:26:34.614 UTC [4420] LOG: parameter
"sync_replication_slots" changed to "on"
2025-07-25 09:26:34.615 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"
2025-07-25 09:27:34.662 UTC [4420] LOG: replication slot
synchronization requires "wal_level" >= "logical"

The instance did not restart as expected, showing this fatal log:

2025-07-25 09:27:45.668 UTC [4430] FATAL: replication slot
synchronization ("sync_replication_slots" = on) requires "wal_level"
>= "logical"

I have a couple of observations:

-

With this patch, a primary instance will not restart if the
configuration is incorrect.
-

Only wal_level is checked, but the ValidateSlotSyncParams function
includes other mandatory parameters. These are not being checked during
startup.

Regards,

Hugo

Le ven. 25 juil. 2025 à 08:16, shveta malik <shveta(dot)malik(at)gmail(dot)com> a
écrit :

> On Fri, Jul 25, 2025 at 12:20 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois(at)scaleway(dot)com>
> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm not sure if it's a bug but I've encountered an unexpected
> behavior when dynamically changing the sync_replication_slots parameter on
> a PostgreSQL 17 standby server. Instead of logging an error and continuing
> to run, the standby instance shuts down with a FATAL error, which is not
> the anticipated behavior for a dynamic parameter change, especially when
> the documentation doesn't indicate such an outcome.
> > > >
> > > > Steps to Reproduce
> > > >
> > > > Set up a physical replication between two PostgreSQL 17.5 instances.
> > > >
> > > > Ensure wal_level on the primary (and consequently on the standby) is
> set to replica.
> > > >
> > > > Start both the primary and standby instances, confirming replication
> is active.
> > > >
> > > > On the standby instance, dynamically change the
> sync_replication_slots parameter (I have run the following query: ALTER
> SYSTEM SET sync_replication_slots = 'on'; followed by SELECT
> pg_reload_conf();)
> > > >
> > > > Expected Behavior
> > > >
> > > > I expected the standby instance to continue running and log an error
> message (similar to how hot_standby_feedback behaves when not enabled,
> e.g., a loop of LOG: replication slot synchronization requires
> "hot_standby_feedback" to be enabled). A FATAL error leading to an
> unexpected shutdown for a dynamic parameter change on a running standby is
> not the anticipated behavior. The documentation for sync_replication_slots
> also doesn't indicate that a misconfiguration or incompatible wal_level
> would lead to a shutdown.
> > > >
> > > > Actual Behavior
> > > >
> > > > Upon attempting to set sync_replication_slots to on on the standby
> with wal_level set to replica, the standby instance immediately shuts down
> with the following log messages:
> > > >
> > > > LOG: database system is ready to accept read-only connections
> > > > LOG: started streaming WAL from primary at 0/3000000 on timeline 1
> > > > LOG: received SIGHUP, reloading configuration files
> > > > LOG: parameter "sync_replication_slots" changed to "on"
> > > > FATAL: replication slot synchronization requires "wal_level" >=
> "logical"
> > > >
> > > > Environment
> > > >
> > > > PostgreSQL Version: 17.5
> > >
> > > Thanks for the report!
> > >
> > > I was able to reproduce the issue even on the latest master (v19dev).
> > > I agree that the current behavior—where changing a GUC parameter can
> > > cause the server to shut down—is unexpected and should be avoided.
> > >
> > > From what I’ve seen in the code, the problem stems from postmaster
> > > calling ValidateSlotSyncParams() before starting the slot sync worker.
> > > That function raises an ERROR if wal_level is not logical while
> > > sync_replication_slots is enabled. Since ERROR is treated as FATAL
> > > in postmaster, it causes the server to exit.
> > >
> > > To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
> > > raise an ERROR in this case, as follows.
> > >
> > > ValidateSlotSyncParams(int elevel)
> > > {
> > > /*
> > > * Logical slot sync/creation requires wal_level >= logical.
> > > - *
> > > - * Since altering the wal_level requires a server restart, so error
> out in
> > > - * this case regardless of elevel provided by caller.
> > > */
> > > if (wal_level < WAL_LEVEL_LOGICAL)
> > > - ereport(ERROR,
> > > + {
> > > + ereport(elevel,
> > > errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > > errmsg("replication slot synchronization requires \"wal_level\" >=
> > > \"logical\""));
> > > + return false;
> > > + }
> >
> > I've created a patch to implement the above—attached.
>
> Thank You for the patch.
>
> > Note that this patch does not change the existing behavior when
> > the misconfiguration (sync_replication_slots enabled but wal_level not
> > set to logical) is detected at server startup. In that case, the server
> > still shuts down with a FATAL error, which is consistent with other
> > settings like summarize_wal.
> >
>
> Validated the behaviour, the patch looks good to me.
>
> thanks
> Shveta
>

--

Cordialement,

[image: Scaleway]
<https://www-uploads.scaleway.com/Logo_Scaleway_2022_New_Tagline_Purple_CMYK_1_c52a040544.pdf?updated_at=2022-10-28T15:20:53.668Z>

*Hugo DUBOIS*
*Devops Engineer*

hdubois(at)scaleway(dot)com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2025-07-25 10:32:25 Re: Unexpected Standby Shutdown on sync_replication_slots change
Previous Message shveta malik 2025-07-25 06:16:44 Re: Unexpected Standby Shutdown on sync_replication_slots change