From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Kevin K Biju <kevinkbiju(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait |
Date: | 2025-06-03 06:55:45 |
Message-ID: | 2b7e25fb-1ad1-45c0-9643-605867421d9f@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025/06/03 13:35, Xuneng Zhou wrote:
> Hi all,
>
> This update amends the previous patch by replacing the global
> XactLockTableWait_us with a local wait_us variable inside
> XactLockTableWait. Now each call resets its backoff to 1 ms, doubling
> up to 1 s when oper == XLTW_None. This eliminates shared‐state bugs
> (where multiple backends or successive calls would otherwise inherit a
> stale, high sleep value).
>
> I remain concerned about relying on oper == XLTW_None to signal
> “logical‐replication” use cases, since:
> It couples backoff behavior to the absence of an error‐callback
> context. Future callers passing oper == None for other reasons would
> inadvertently get progressive backoff.
> The intent (“I need a long‐wait backoff”) isn’t self‐evident from oper.
I think that this concern is right and we should avoid this approach.
> An alternative is to add a separate boolean or enum argument (e.g.
> bool isLogicalReplication) so that callers explicitly opt in to
> backoff. That approach is more verbose but avoids hidden coupling and
> is easier to maintain if additional contexts require different wait
> policies. If ConditionalXactLockTableWait ever needs backoff, it would
> benefit from the same explicit flag.
I prefer this approach over "oper == XLTW_None" one.
Just idea, if XactLockTableWait() is expected to finish within a few seconds
after acquiring the lock, how about this approach: keep sleeping for 1ms
until the total sleep time reaches 10s (10s is just an example),
and after that, start doubling the sleep duration each cycle, up to
a maximum of 10s. That is, even in non-"create replication slot" case,
if the total sleep time exceeds 10s, it seems safe to double the sleep time
up to 10s. This way, we stay responsive early on but can back off more
aggressively if needed. Thought? Anyway we probably need to study
XactLockTableWait() behavior more closely.
> On Tue, Jun 3, 2025 at 11:32 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>>
>> Hi all,
>>
>> Thank you for the earlier pushed patch—after testing it, things work
>> well. Per Fujii’s and Kevin’s ideas and suggestions[1], [2], I’ve
>> tried to extend it to add a new wait event and a progressive backoff
>> mechanism for XactLockTableWait in logical replication scenarios.
Thanks for the patch! I haven't reviewed it yet, but since this is
a v19 item, please add it to the next CommitFest so we don't lose
track of it.
Also, I think it would be better to split the addition of the wait event
and the introduction of exponential backoff in XactLockTableWait() into
separate patches. They serve different purposes and can be committed
independently.
BTW, regarding the discussion on the list, please avoid top-posting;
bottom-posting is the preferred style on this mailing list.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2025-06-03 07:26:54 | Re: proposal: schema variables |
Previous Message | Shinya Kato | 2025-06-03 06:35:20 | Add log_autovacuum_{vacuum|analyze}_min_duration |