Re: issue with synchronized_standby_slots

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: issue with synchronized_standby_slots
Date: 2025-09-08 15:51:08
Message-ID: CAHGQGwE3wOmstHgiWzuCLKJXEJ8BHmtET+5tRiLmG3QKfhhbwQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 8, 2025 at 6:26 PM Alexander Kukushkin <cyberdemn(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> On Sun, 7 Sept 2025 at 10:15, Fabrice Chapuis <fabrice636861(at)gmail(dot)com> wrote:
>>
>> Thanks for your reply Zhijie,
>>
>> I understand that the error invalid value for parameter will be diplayed in case of bad value for the GUC synchronized_standby_slots or if a standby node configured is not up and running.
>> But the problem I noticed is that statements could not execute normally and error code is returned to the applcation.
>> This append after an upgrade from PG 14 to PG 17.
>> I could try to reproduce the issue
>
>
>>
>> > STATEMENT: select service_period,sp1_0.address_line_1 from tbl1 where http://sp1_0.vn=$1 order by sp1_0.start_of_period
>>>
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR: invalid value for parameter "synchronized_standby_slots": "node1,node2"
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL: replication slot "s029054a" does not exist
>>> > 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT: while setting parameter "synchronized_standby_slots" to "node1,node2"
>>> > 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848476) exited with exit code 1
>>> > 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG: background worker "parallel worker" (PID 848477) exited with exit code 1
>>> >
>>> > Is this issue already observed
>
>
> Recently we also hit this problem.
>
> I think in a current state check_synchronized_standby_slots() and validate_sync_standby_slots() functions are not very useful:
> - When the hook is executed from postmaster it only checks that synchronized_standby_slots contains a valid list, but doesn't check that replication slots exists, because MyProc is NULL. It happens both, on start and on reload.

This looks quite problematic. If a non-existent slot is specified in
synchronized_standby_slots in postgresql.conf and the configuration file
is reloaded, no error is reported. This happens because the postmaster
cannot detect that the slot doesn't exist (since it has no MyProc).
As a result, synchronized_standby_slots in the postmaster is set to
that slot. New backends then inherit this setting from the postmaster,
while already running backends correctly detect that the slot doesn't
exist and fail to apply it.

This leads to an inconsistent state: the reload succeeds with no error,
but some backends apply the new setting while others do not.
That inconsistency seems like an issue.

To fix this, ISTM that the GUC check hook for synchronized_standby_slots
should be revised so it doesn't rely on MyProc or perform slot existence
checks there....

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-09-08 15:55:29 Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring
Previous Message Melanie Plageman 2025-09-08 15:44:24 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)