| From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
|---|---|
| To: | 'Fujii Masao' <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | RE: logical apply worker's lock waits in subscriber can stall checkpointer in publisher |
| Date: | 2026-01-29 07:03:29 |
| Message-ID: | TY7PR01MB145549C44DB50705E0E3D3DCAF59EA@TY7PR01MB14554.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Dear Fujii-san,
> While reviewing the patch at [1], I noticed a case where lock waits on
> a logical apply worker in the subscriber can cause the checkpointer on
> the publisher to stall. This seems like unexpected behavior and may
> need to be addressed.
>
> The issue can occur as follows:
>
> 1. A logical apply worker on the subscriber blocks waiting for a lock.
> 2. Because the apply worker cannot receive further messages, the walsender's
> send buffer on the publisher becomes full.
> 3. If the walsender then encounters a max_slot_wal_keep_size error,
> it attempts to send an error message to the subscriber before exiting.
> However, with a full send buffer, the walsender blocks while trying to
> send this message.
> 4. The checkpointer on the publisher calls InvalidateObsoleteReplicationSlots()
> and waits for the slot to be released. Since the walsender is stuck and
> the slot is not released, the checkpointer also becomes stuck.
I confirmed this could happen if the max_slot_wal_keep_size is enabled
(in other words, the value is not -1).
Per my test, wal_sender_timeout cannot work well because the process is stuck at
the lower layer, but tcp_user_timeout can terminate the process. Can we mention
the workaround in the doc instead of fixing the code?
It won't work for a Unix domain socket connection, but it's not realistic for the
production stage.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-01-29 07:25:31 | Re: [PATCH] Refactor *_abbrev_convert() functions |
| Previous Message | Richard Guo | 2026-01-29 06:44:24 | Re: pg_plan_advice |