Re: Unexpected Standby Shutdown on sync_replication_slots change

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Hugo DUBOIS <hdubois(at)scaleway(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected Standby Shutdown on sync_replication_slots change
Date: 2025-07-24 18:50:07
Message-ID: CAHGQGwFG8qB_vYbP-2Ec=WA4q=kzjMeN0F8Z1JZMbYku6qzq6w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 25, 2025 at 12:55 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Jul 24, 2025 at 10:54 PM Hugo DUBOIS <hdubois(at)scaleway(dot)com> wrote:
> >
> > Hello,
> >
> > I'm not sure if it's a bug but I've encountered an unexpected behavior when dynamically changing the sync_replication_slots parameter on a PostgreSQL 17 standby server. Instead of logging an error and continuing to run, the standby instance shuts down with a FATAL error, which is not the anticipated behavior for a dynamic parameter change, especially when the documentation doesn't indicate such an outcome.
> >
> > Steps to Reproduce
> >
> > Set up a physical replication between two PostgreSQL 17.5 instances.
> >
> > Ensure wal_level on the primary (and consequently on the standby) is set to replica.
> >
> > Start both the primary and standby instances, confirming replication is active.
> >
> > On the standby instance, dynamically change the sync_replication_slots parameter (I have run the following query: ALTER SYSTEM SET sync_replication_slots = 'on'; followed by SELECT pg_reload_conf();)
> >
> > Expected Behavior
> >
> > I expected the standby instance to continue running and log an error message (similar to how hot_standby_feedback behaves when not enabled, e.g., a loop of LOG: replication slot synchronization requires "hot_standby_feedback" to be enabled). A FATAL error leading to an unexpected shutdown for a dynamic parameter change on a running standby is not the anticipated behavior. The documentation for sync_replication_slots also doesn't indicate that a misconfiguration or incompatible wal_level would lead to a shutdown.
> >
> > Actual Behavior
> >
> > Upon attempting to set sync_replication_slots to on on the standby with wal_level set to replica, the standby instance immediately shuts down with the following log messages:
> >
> > LOG: database system is ready to accept read-only connections
> > LOG: started streaming WAL from primary at 0/3000000 on timeline 1
> > LOG: received SIGHUP, reloading configuration files
> > LOG: parameter "sync_replication_slots" changed to "on"
> > FATAL: replication slot synchronization requires "wal_level" >= "logical"
> >
> > Environment
> >
> > PostgreSQL Version: 17.5
>
> Thanks for the report!
>
> I was able to reproduce the issue even on the latest master (v19dev).
> I agree that the current behavior—where changing a GUC parameter can
> cause the server to shut down—is unexpected and should be avoided.
>
> From what I’ve seen in the code, the problem stems from postmaster
> calling ValidateSlotSyncParams() before starting the slot sync worker.
> That function raises an ERROR if wal_level is not logical while
> sync_replication_slots is enabled. Since ERROR is treated as FATAL
> in postmaster, it causes the server to exit.
>
> To fix this, we could modify ValidateSlotSyncParams() so it doesn’t
> raise an ERROR in this case, as follows.
>
> ValidateSlotSyncParams(int elevel)
> {
> /*
> * Logical slot sync/creation requires wal_level >= logical.
> - *
> - * Since altering the wal_level requires a server restart, so error out in
> - * this case regardless of elevel provided by caller.
> */
> if (wal_level < WAL_LEVEL_LOGICAL)
> - ereport(ERROR,
> + {
> + ereport(elevel,
> errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("replication slot synchronization requires \"wal_level\" >=
> \"logical\""));
> + return false;
> + }

I've created a patch to implement the above—attached.

Note that this patch does not change the existing behavior when
the misconfiguration (sync_replication_slots enabled but wal_level not
set to logical) is detected at server startup. In that case, the server
still shuts down with a FATAL error, which is consistent with other
settings like summarize_wal.

Regards,

--
Fujii Masao

Attachment Content-Type Size
v1-0001-Avoid-unexpected-shutdown-when-sync_replication_s.patch application/octet-stream 2.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Laurenz Albe 2025-07-24 21:06:57 Re: BUG #18964: `ALTER DATABASE ... RESET ...` fails to reset extension parameters that no longer exist
Previous Message PG Bug reporting form 2025-07-24 17:50:36 BUG #18998: No materialized views in INFORMATION_SCHEMA.TABLE_PRIVILEGES