Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kevin K Biju <kevinkbiju(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait
Date: 2025-06-03 10:44:54
Message-ID: CABPTF7Xtvt441zb1d3ZHGGE-st0O-WXBwoLca3C_2xAVfN5jPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Fujii,
Thanks a lot for reviewing!

> > This update amends the previous patch by replacing the global
> > XactLockTableWait_us with a local wait_us variable inside
> > XactLockTableWait. Now each call resets its backoff to 1 ms, doubling
> > up to 1 s when oper == XLTW_None. This eliminates shared‐state bugs
> > (where multiple backends or successive calls would otherwise inherit a
> > stale, high sleep value).
> >
> > I remain concerned about relying on oper == XLTW_None to signal
> > “logical‐replication” use cases, since:
> > It couples backoff behavior to the absence of an error‐callback
> > context. Future callers passing oper == None for other reasons would
> > inadvertently get progressive backoff.
> > The intent (“I need a long‐wait backoff”) isn’t self‐evident from oper.
>
> I think that this concern is right and we should avoid this approach.

+1

>
> > An alternative is to add a separate boolean or enum argument (e.g.
> > bool isLogicalReplication) so that callers explicitly opt in to
> > backoff. That approach is more verbose but avoids hidden coupling and
> > is easier to maintain if additional contexts require different wait
> > policies. If ConditionalXactLockTableWait ever needs backoff, it would
> > benefit from the same explicit flag.
>
> I prefer this approach over "oper == XLTW_None" one.

Agreed. My worry was that introducing a new argument adds many
call-site changes, and XLTW_Oper is primarily for error-context
callbacks.

>
> Just idea, if XactLockTableWait() is expected to finish within a few seconds
> after acquiring the lock, how about this approach: keep sleeping for 1ms
> until the total sleep time reaches 10s (10s is just an example),
> and after that, start doubling the sleep duration each cycle, up to
> a maximum of 10s. That is, even in non-"create replication slot" case,
> if the total sleep time exceeds 10s, it seems safe to double the sleep time
> up to 10s. This way, we stay responsive early on but can back off more
> aggressively if needed. Thought? Anyway we probably need to study
> XactLockTableWait() behavior more closely.
>

I agree—we need to study XactLockTableWait() usage more closely.
Picking a fixed threshold could be tricky across different workloads.
Exposing it as a GUC for this one also feels heavy.

> Thanks for the patch! I haven't reviewed it yet, but since this is
> a v19 item, please add it to the next CommitFest so we don't lose
> track of it.
>
> Also, I think it would be better to split the addition of the wait event
> and the introduction of exponential backoff in XactLockTableWait() into
> separate patches. They serve different purposes and can be committed
> independently.

Understood—I’ll split them and move both to July’s CommitFest.

>
> BTW, regarding the discussion on the list, please avoid top-posting;
> bottom-posting is the preferred style on this mailing list.
>

Good point—I’ll use bottom-posting going forward.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2025-06-03 10:45:34 Re: pgsql: postgres_fdw: Inherit the local transaction's access/deferrable
Previous Message Dilip Kumar 2025-06-03 10:39:36 Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7