Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kirill Reshke <reshkekirill(at)gmail(dot)com>
Subject: Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date: 2025-12-16 10:29:00
Message-ID: CABPTF7WyXGvJrzJoZ+g5UCLTj3-HCMTtU4iHmU=FryTTOTf1dw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 10, 2025 at 11:22 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> With a sorted commited.xip array, we could replace the iteration with
> two binary searches to find the interval to keep.
>
> Proposed Optimization
> ---------------------
>
> Use binary search to locate the boundaries of XIDs to remove, then
> compact with a single memmove. The key insight requires understanding
> how XID precedence relates to numeric ordering.
>
> XID Precedence Definition
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> PostgreSQL defines XID precedence as:
>
> /* compare two XIDs already known to be normal; this is a macro for speed */
> #define NormalTransactionIdPrecedes(id1, id2) \
> (AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
> (int32) ((id1) - (id2)) < 0)
>
> This means: id1 precedes id2 if (int32)(id1 - id2) < 0.
>
> Equivalently, this identifies all XIDs in the modular interval
> [id2 - 2^31, id2) on the 32-bit ring as "preceding id2". So XIDs
> preceding xmin are exactly those in [xmin - 2^31, xmin).
>
> From Modular Interval to Array Positions
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The arrays are sorted in numeric uint32 order (xip[i] <= xip[i+1] in
> unsigned sense), which is a total order—not wraparound-aware. Therefore,
> the modular interval we want to remove may appear as one or two numeric
> blocks in the sorted array.
>
> Let boundary = xmin - 2^31 (mod 2^32). The modular interval [boundary, xmin)
> contains all XIDs to remove (half-open: xmin itself is kept, matching
> NormalTransactionIdPrecedes). Where does it appear in the numerically sorted
> array?
>
> Case A: (uint32)boundary <= (uint32)xmin (numeric no wrap)
> Example: xmin = 3,000,000,000
> boundary = 3,000,000,000 - 2,147,483,648 = 852,516,352
>
> Here, (uint32)boundary < (uint32)xmin, so the interval does not cross
> 0 numerically. In the sorted array, XIDs to remove form one contiguous
> block: [idx_boundary, idx_xmin).
>
> Array layout:
> [... keep ...][=== remove ===][... keep ...]
> 0 ............ idx_boundary ... idx_xmin ...... n
>
> Action: Keep prefix [0, idx_boundary) and suffix [idx_xmin, n).
>
> Case B: (uint32)boundary > (uint32)xmin (numeric wrap)
> Example: xmin = 100
> boundary = 100 - 2^31 (mod 2^32) = 2,147,483,748
>
> Since (uint32)boundary > (uint32)xmin, the interval wraps through 0
> numerically. In the sorted array, XIDs to remove form two blocks:
> [0, idx_xmin) and [idx_boundary, n).
>
> Array layout:
> [= remove =][... keep ...][= remove =]
> 0 ......... idx_xmin .... idx_boundary ......... n
>
> Action: Keep only the middle [idx_xmin, idx_boundary).
>
> Note: Case B often occurs when xmin is "small" (e.g., right after
> startup), making xmin - 2^31 wrap numerically. This is purely about
> positions in the numeric order; it does not imply the cluster has
> "wrapped" XIDs operationally.
>
> In both cases, we locate idx_boundary and idx_xmin using binary search
> in O(log n) time, then use one memmove to compact
>
> The algorithm:
> 1. Compute boundary = xmin - 2^31
> 2. Binary search for idx_boundary (first index with xip[i] >= boundary)
> 3. Binary search for idx_xmin (first index with xip[i] >= xmin)
> 4. Use memmove to compact based on case A or B above
>
> Benefits
> --------
>
> 1. Performance: O(log n) binary search vs O(n) linear scan
> 2. Memory: No workspace allocation needed
> 3. Simplicity: One memmove instead of allocate + two copies + free
>
> The same logic is applied to both committed.xip and catchange.xip arrays.
>
> Faster binary search
> --------
>
> While faster binary search variants exist, the current code already
> introduces more complexity than the original. It’s uncertain that
> further optimization would deliver a meaningful performance gain.
>

Adapt the patch with two-phase optimization:

- Pre-CONSISTENT: Use in-place compaction O(n) since committed.xip is
unsorted during this phase.

- Post-CONSISTENT: Use binary search O(log n) since committed.xip is
maintained in sorted order after reaching consistency.

--
Best,
Xuneng

Attachment Content-Type Size
v2-0001-Optimize-SnapBuild-by-maintaining-committed.xip-i.patch application/x-patch 11.0 KB
v2-0002-Optimize-SnapBuildPurgeOlderTxn-with-two-phase-ap.patch application/x-patch 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-12-16 10:35:36 Re: Fix crash during recovery when redo segment is missing
Previous Message Laurenz Albe 2025-12-16 10:25:16 Re: doc: create table improvements