Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date: 2025-10-21 06:13:26
Message-ID: CABPTF7XmcYhFgDefTLJq2awOG-y0Qg+_OpKYgx=2WgMED3ui_A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for looking into this.

On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote:
> > > Indeed, these changes look correct.
> > > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > > anything here.
> >
> > Perhaps a lack of time back in 2014? It feels like an item where we
> > would need to research a bit some of the past threads, and see if this
> > has been discussed, or if there were other potential alternatives
> > discussed. This is not saying that what you are doing in this
> > proposal is actually bad, but it's a bit hard to say what an
> > "algorithm" should look like in this specific code path with XID
> > manipulations. Perhaps since 2014, we may have other places in the
> > tree that share similar characteristics as what's done here.
> >
> > So it feels like this needs a bit more historical investigation first,
> > rather than saying that your proposal is the best choice on the table.
> > --
> > Michael
>
> Sure
>
> Commit b89e151054a0 comes from [0]
> Comment of SnapBuildPurgeCommittedTxn tracks to [1] (it was in form
> "XXX: Neater algorithm?")
>
> Between these two messages, it was not disucccesseed...
>
> I will also study other related threads like [2], but i don't think
> they will give more insight here.
>
> [0] https://www.postgresql.org/message-id/20140303162652.GB16654%40awork2.anarazel.de
> [1] https://www.postgresql.org/message-id/20140115002223.GA17204%40awork2.anarazel.de
> [2] https://www.postgresql.org/message-id/flat/20130914204913.GA4071%40awork2.anarazel.de#:~:text=20130914204913.GA4071%40awork2.anarazel.de

Introducing logical decoding was a major feature, and I assume there
were more critical design decisions and trade-offs to consider at that
time.

I proposed this refactor not because it is the most significant
optimization, but because it seems to be a low-hanging fruit with
clear benefits. By using an in-place two-pointer compaction, we can
eliminate the extra memory allocation and copy-back step without
introducing risks to this well-tested code path.

A comparable optimization exists in KnownAssignedXidsCompress() which
uses the same algorithm to remove stale XIDs without workspace
allocation. That implementation also adds a lazy compaction heuristic
that delays compaction until a threshold of removed entries is
reached, amortizing the O(N) cost across multiple operations.

The comment above the data structure mentions the trade-off of keeping
the committed.xip array sorted versus unsorted. If the array were
sorted, we could use a binary search combined with memmove to compact
it efficiently, achieving O(log n + n) complexity for purging.
However, that design would increase the complexity of
SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt
wraparound".

/*
* Array of committed transactions that have modified the catalog.
*
* As this array is frequently modified we do *not* keep it in
* xidComparator order. Instead we sort the array when building &
* distributing a snapshot.
*
* TODO: It's unclear whether that reasoning has much merit. Every
* time we add something here after becoming consistent will also
* require distributing a snapshot. Storing them sorted would
* potentially also make it easier to purge (but more complicated wrt
* wraparound?). Should be improved if sorting while building the
* snapshot shows up in profiles.
*/
TransactionId *xip;
} committed;

/*
* Keep track of a new catalog changing transaction that has committed.
*/
static void
SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
{
Assert(TransactionIdIsValid(xid));

if (builder->committed.xcnt == builder->committed.xcnt_space)
{
builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;

elog(DEBUG1, "increasing space for committed transactions to %u",
(uint32) builder->committed.xcnt_space);

builder->committed.xip = repalloc(builder->committed.xip,
builder->committed.xcnt_space * sizeof(TransactionId));
}

/*
* TODO: It might make sense to keep the array sorted here instead of
* doing it every time we build a new snapshot. On the other hand this
* gets called repeatedly when a transaction with subtransactions commits.
*/
builder->committed.xip[builder->committed.xcnt++] = xid;
}

It might be worth profiling this function to evaluate whether
maintaining a sorted array could bring potential benefits, although
accurately measuring its end-to-end impact could be difficult if it
isn’t a known hotspot. I also did a brief search on the mailing list
and found no reports of performance concerns or related proposals to
optimize this part of the code.

[1] https://www.postgresql.org/search/?m=1&q=SnapBuildPurgeOlderTxn&l=1&d=-1&s=r
[2] https://www.postgresql.org/search/?m=1&q=committed.xip&l=1&d=-1&s=r&p=2

Best,
Xuneng

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KAZAR Ayoub 2025-10-21 06:17:01 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message Fujii Masao 2025-10-21 06:11:26 Re: Invalid primary_slot_name triggers warnings in all processes on reload