Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kevin K Biju <kevinkbiju(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait
Date: 2025-06-03 04:35:11
Message-ID: CABPTF7WNA3epnLyP5+zegtZXS+uACavB3cueqm2pDLJ1T9_w6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

This update amends the previous patch by replacing the global
XactLockTableWait_us with a local wait_us variable inside
XactLockTableWait. Now each call resets its backoff to 1 ms, doubling
up to 1 s when oper == XLTW_None. This eliminates shared‐state bugs
(where multiple backends or successive calls would otherwise inherit a
stale, high sleep value).

I remain concerned about relying on oper == XLTW_None to signal
“logical‐replication” use cases, since:
It couples backoff behavior to the absence of an error‐callback
context. Future callers passing oper == None for other reasons would
inadvertently get progressive backoff.
The intent (“I need a long‐wait backoff”) isn’t self‐evident from oper.

An alternative is to add a separate boolean or enum argument (e.g.
bool isLogicalReplication) so that callers explicitly opt in to
backoff. That approach is more verbose but avoids hidden coupling and
is easier to maintain if additional contexts require different wait
policies. If ConditionalXactLockTableWait ever needs backoff, it would
benefit from the same explicit flag.

On Tue, Jun 3, 2025 at 11:32 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi all,
>
> Thank you for the earlier pushed patch—after testing it, things work
> well. Per Fujii’s and Kevin’s ideas and suggestions[1], [2], I’ve
> tried to extend it to add a new wait event and a progressive backoff
> mechanism for XactLockTableWait in logical replication scenarios.
>
> New wait event
>
> • XACT_DONE (“Waiting for a transaction to commit or abort”):
>
> This shows up when a standby is blocked waiting for the primary’s XID replay.
>
> Progressive backoff
>
> • When XactLockTableWait is called from logical replication (oper ==
> XLTW_None), the sleep interval doubles each loop up to one second.
>
> • ConditionalXactLockTableWait still uses a fixed 1 ms sleep, since it
> is only invoked by heap and index operations.
>
>
> With the patch applied, a blocked slot‐creation session on standby now shows:
>
> testdb=# SELECT pid, wait_event_type, wait_event, state, query FROM
> pg_stat_activity WHERE pid = 62774;
> pid | wait_event_type | wait_event | state |
> query
>
> -------+-----------------+------------+--------+------------------------------------------------------------------
>
> 62774 | IPC | XactDone | active | SELECT *
>
> | | | | FROM
> pg_create_logical_replication_slot('my_slot','pgoutput');
>
> (1 row)
>
> Please let me know if you have any feedback or suggestions.
>
> [1] https://www.postgresql.org/message-id/422ea29f-2051-441c-aa5d-5eea04a81b95%40oss.nttdata.com
> [2] https://www.postgresql.org/message-id/CAM45KeF6%2B3GtR7dDTAM-kwpGWpnUmAeg%3Djh0N3cd%3DNGBgvVkkA%40mail.gmail.com

Attachment Content-Type Size
0002-Add-wait-event-and-progressive-backoff-to-XactLockTable.patch application/octet-stream 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-06-03 04:38:49 Re: pg_upgrade: warn about roles with md5 passwords
Previous Message Michael Paquier 2025-06-03 04:27:15 Re: Encapsulate io_uring process count calculation