From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Kevin K Biju <kevinkbiju(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add progressive backoff to XactLockTableWait functions |
Date: | 2025-07-30 15:42:10 |
Message-ID: | CABPTF7XOnyfU09CnCpCth2sPNg2NkWcCBMNovU-tbOAFYM0nhw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, Jul 28, 2025 at 7:14 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Thu, Jul 17, 2025 at 10:54 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > After studying proarray and lmgr more closely, I have found several
> > critical issues in the two patches and underestimated their complexity
> > and subtlety. Sorry for posting immature patches that may have created
> > noise.
> >
> > I now realize that making lock acquisition— (void) LockAcquire(&tag,
> > ShareLock, false, false); in XactLockTableWait
> > work on a standby, following Andres’ first suggestion, may be simpler
> > and require fewer new helpers.
> >
> > XactLockTableWait fails on a standby simply because we never call
> > XactLockTableInsert. I am considering invoking XactLockTableInsert
> > when an XID is added to KnownAssignedXids, and XactLockTableDelete(the
> > constraint for delete now is not used for main xid in primary) when it
> > is removed. A preliminary implementation and test shows this approach
> > kinda works, but still need to confirm it is safe on a standby and
> > does not break other things.
>
> Adding an xact lock for every tracked assigned XID on a standby—by
> enabling XactLockTableInsert and XactLockTableDelete in the
> KnownAssignedXids infrastructure— appears less attractive than I first
> thought. The main concern is that the actual rate of xact-lock usage
> may not justify the added overhead, even if that overhead is modest.
>
> 1) Usage
> On a primary server, a xact lock is taken for every assigned XID, and
> the cost is justified because xact locks are referenced in many code
> paths. On a standby, however—especially in versions where logical
> decoding is disabled—xact locks are used far less, if at all. The
> current call stacks for XactLockTableWait on a standby (HEAD) are
> listed in [1]. The only high-frequency caller is the logical walsender
> in a cascading standby; all other callers are infrequent. Does that
> single use case warrant creating a xact lock for every known assigned
> XID? I don't think so, given that typical configurations have few
> cascading standbys. In practice, most xact locks may never be touched
> after they are created.
>
> 2) Cost
> Low usage would be fine if the additional cost were negligible, but
> that does not appear to be the case AFAICS.
>
> * There are 5 main uses of the KnownAssignedXids data structure:
> * * backends taking snapshots - all valid XIDs need to be copied out
> * * backends seeking to determine presence of a specific XID
> * * startup process adding new known-assigned XIDs
> * * startup process removing specific XIDs as transactions end
> * * startup process pruning array when special WAL records arrive
> *
> * This data structure is known to be a hot spot during Hot Standby, so we
> * go to some lengths to make these operations as efficient and as concurrent
> * as possible.
>
> KnownAssignedXids is a hot spot like the comments state. The existing
> design replaces a hash table with a single array and minimizes
> exclusive locks and memory barriers to preserve concurrency. To
> respect that design, calls to XactLockTableInsert and
> XactLockTableDelete would need to occur outside any exclusive lock.
> Unfortunately, that re-introduces the polling we are trying to
> eliminate in XactLockTableWait: there is a window between registering
> the XIDs and adding their xact locks. During that window, a backend
> calling XactLockTableWait may need to poll because
> TransactionIdIsInProgress shows the transaction as running while the
> xact lock is still missing. If the window were short, this might be
> acceptable, but it can be lengthy because XIDs are inserted and
> deleted from the array in bulk. Some unlucky backends will therefore
> spin before they can actually wait.
>
> Placing the XactLockTableInsert/Delete calls inside the exclusive lock
> removes the race window but hurts concurrency—the very issue this
> module strives to avoid. Operations on the shared-memory hash table in
> lmgr are slower and hold the lock longer than simple array updates, so
> backends invoking GetSnapshotData could experience longer wait times.
>
> 3) Alternative
> Adopt the hash-table + condition-variable design from patch v6, with
> additional fixes and polishes.
> Back-ends that call XactLockTableWait() sleep on the condition
> variable. The startup process broadcasts a wake-up as the XID is
> removed from KnownAssignedXids. The broadcast happens after releasing
> the exclusive lock. This may be safe—sleeping back-ends remain blocked
> until the CV signal arrives, so no extra polling occurs even though
> the XID has already vanished from the array. For consistency, all
> removal paths (including those used during shutdown or promotion,
> where contention is minimal) follow the same pattern. The hash table
> is sized at 2 × MaxBackends, which is already conservative for today’s
> workload. For more potential concurrent use cases in the future, it
> could be switched to larger numbers like TOTAL_MAX_CACHED_SUBXIDS if
> necessary; even that would consume only a few megabytes.
>
> Feedbacks welcome.
>
> [1] https://www.postgresql.org/message-id/CABPTF7Wbp7MRPGsqd9NA4GbcSzUcNz1ymgWfir=Yf+N0oDRbjA@mail.gmail.com
Because XactLocks are not maintained on a standby server, and the
current waiting approach does not rely on Xact locks as well; I
therefore added assertions to both functions to prevent their use on a
standby and introduced an explicit primary-versus-standby branch in
SnapBuildWaitSnapshot in v8.
In procarray.c, two corrections are made compared with v7:
Removed the seemingly redundant initialized field from XidWaitEntry.
Removed the erroneous InHotStandby checks in the wait and wake helpers.
With these changes applied, here's some perf stats for v8, which are
about one order of magnitude smaller than those of head.
Performance counter stats for process id '3273840':
351,183,876 cycles
126,586,090 instructions # 0.36
insn per cycle
16,670,633 cache-misses
10,030 context-switches
100.001419856 seconds time elapsed
Best,
Xuneng
Attachment | Content-Type | Size |
---|---|---|
v8-0001-Optimize-transaction-waiting-in-standby.patch | application/octet-stream | 14.5 KB |
v8.svg | image/svg+xml | 100.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-07-30 15:48:08 | Re: Making type Datum be 8 bytes everywhere |
Previous Message | Jacob Champion | 2025-07-30 15:37:38 | Re: Improve prep_buildtree |