From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kevin K Biju <kevinkbiju(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait |
Date: | 2025-06-03 10:44:54 |
Message-ID: | CABPTF7Xtvt441zb1d3ZHGGE-st0O-WXBwoLca3C_2xAVfN5jPA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Fujii,
Thanks a lot for reviewing!
> > This update amends the previous patch by replacing the global
> > XactLockTableWait_us with a local wait_us variable inside
> > XactLockTableWait. Now each call resets its backoff to 1 ms, doubling
> > up to 1 s when oper == XLTW_None. This eliminates shared‐state bugs
> > (where multiple backends or successive calls would otherwise inherit a
> > stale, high sleep value).
> >
> > I remain concerned about relying on oper == XLTW_None to signal
> > “logical‐replication” use cases, since:
> > It couples backoff behavior to the absence of an error‐callback
> > context. Future callers passing oper == None for other reasons would
> > inadvertently get progressive backoff.
> > The intent (“I need a long‐wait backoff”) isn’t self‐evident from oper.
>
> I think that this concern is right and we should avoid this approach.
+1
>
> > An alternative is to add a separate boolean or enum argument (e.g.
> > bool isLogicalReplication) so that callers explicitly opt in to
> > backoff. That approach is more verbose but avoids hidden coupling and
> > is easier to maintain if additional contexts require different wait
> > policies. If ConditionalXactLockTableWait ever needs backoff, it would
> > benefit from the same explicit flag.
>
> I prefer this approach over "oper == XLTW_None" one.
Agreed. My worry was that introducing a new argument adds many
call-site changes, and XLTW_Oper is primarily for error-context
callbacks.
>
> Just idea, if XactLockTableWait() is expected to finish within a few seconds
> after acquiring the lock, how about this approach: keep sleeping for 1ms
> until the total sleep time reaches 10s (10s is just an example),
> and after that, start doubling the sleep duration each cycle, up to
> a maximum of 10s. That is, even in non-"create replication slot" case,
> if the total sleep time exceeds 10s, it seems safe to double the sleep time
> up to 10s. This way, we stay responsive early on but can back off more
> aggressively if needed. Thought? Anyway we probably need to study
> XactLockTableWait() behavior more closely.
>
I agree—we need to study XactLockTableWait() usage more closely.
Picking a fixed threshold could be tricky across different workloads.
Exposing it as a GUC for this one also feels heavy.
> Thanks for the patch! I haven't reviewed it yet, but since this is
> a v19 item, please add it to the next CommitFest so we don't lose
> track of it.
>
> Also, I think it would be better to split the addition of the wait event
> and the introduction of exponential backoff in XactLockTableWait() into
> separate patches. They serve different purposes and can be committed
> independently.
Understood—I’ll split them and move both to July’s CommitFest.
>
> BTW, regarding the discussion on the list, please avoid top-posting;
> bottom-posting is the preferred style on this mailing list.
>
Good point—I’ll use bottom-posting going forward.
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2025-06-03 10:45:34 | Re: pgsql: postgres_fdw: Inherit the local transaction's access/deferrable |
Previous Message | Dilip Kumar | 2025-06-03 10:39:36 | Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7 |