RE: issue with synchronized_standby_slots

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: issue with synchronized_standby_slots
Date: 2025-09-05 04:07:32
Message-ID: TY4PR01MB16907C1E07F375149A2E5CDB69403A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
> With PG 17.5 and using logical replication failover slots. When trying to
> change the value of synchronized_standby_slots, node2 was not running then the
> error invalid value for parameter "synchronized_standby_slots": "node1,node2"
> was generated. The problem is that statement were affected by this and they
> can't execute.
>
> STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where http://sp1_0.vn=$1 order by sp1_0.start_of_period
> 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR: invalid value for parameter "synchronized_standby_slots": "node1,node2"
> 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL: replication slot "s029054a" does not exist
> 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT: while setting parameter "synchronized_standby_slots" to "node1,node2"
> 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848476) exited with exit code 1
> 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848477) exited with exit code 1
>
> Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent slot to
synchronized_standby_slots before the server startup. The server does not verify
the existence of slots at startup due to the absence of slot shared information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you saw.

I think this scenario is not necessarily a bug, as adding nonexistent slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not been
restored to shared memory during GUC checks. Another option might be to simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-09-05 04:58:36 Re: Refactoring: Use soft error reporting for *_opt_error functions
Previous Message Hayato Kuroda (Fujitsu) 2025-09-05 02:55:14 RE: Resetting recovery target parameters in pg_createsubscriber