Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date: 2025-10-24 18:35:41
Message-ID: CAD21AoBwmjgz4LsGhJT52DhW4riN0sJ4d4qZg-WQU5jApO8umA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 23, 2025 at 1:17 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi Sawada-san, Michael,
>
> Thanks for your comments on this patch.
>
> On Thu, Oct 23, 2025 at 8:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Oct 20, 2025 at 11:13 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > Thanks for looking into this.
> > >
> > > On Tue, Oct 21, 2025 at 1:05 PM Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, 21 Oct 2025 at 04:31, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > > > >
> > > > > On Sat, Oct 18, 2025 at 01:59:40PM +0500, Kirill Reshke wrote:
> > > > > > Indeed, these changes look correct.
> > > > > > I wonder why b89e151054a0 did this place this way, hope we do not miss
> > > > > > anything here.
> > > > >
> > > > > Perhaps a lack of time back in 2014? It feels like an item where we
> > > > > would need to research a bit some of the past threads, and see if this
> > > > > has been discussed, or if there were other potential alternatives
> > > > > discussed. This is not saying that what you are doing in this
> > > > > proposal is actually bad, but it's a bit hard to say what an
> > > > > "algorithm" should look like in this specific code path with XID
> > > > > manipulations. Perhaps since 2014, we may have other places in the
> > > > > tree that share similar characteristics as what's done here.
> > > > >
> > > > > So it feels like this needs a bit more historical investigation first,
> > > > > rather than saying that your proposal is the best choice on the table.
>
> Following these suggestions, I carefully searched the mailing list
> archives and found no reports of performance issues directly related
> to this code path. I also examined other parts of the codebase for
> similar patterns. Components like integerset might share some
> characteristics with SnapBuildPurgeOlderTxn, but they have constraints
> that make them not directly applicable here. I am not very familiar
> with the whole tree, so the investigation might not be exhaustive.

Thank you for looking through the archives. In logical replication,
performance problems typically show up as replication delays. Since
logical replication involves many different components and processes,
it's quite rare for investigations to trace problems back to this
specific piece of code. However, I still believe it's important to
optimize the performance of logical decoding itself.

>
> > >
> > > A comparable optimization exists in KnownAssignedXidsCompress() which
> > > uses the same algorithm to remove stale XIDs without workspace
> > > allocation. That implementation also adds a lazy compaction heuristic
> > > that delays compaction until a threshold of removed entries is
> > > reached, amortizing the O(N) cost across multiple operations.
> > >
> > > The comment above the data structure mentions the trade-off of keeping
> > > the committed.xip array sorted versus unsorted. If the array were
> > > sorted, we could use a binary search combined with memmove to compact
> > > it efficiently, achieving O(log n + n) complexity for purging.
> > > However, that design would increase the complexity of
> > > SnapBuildAddCommittedTxn from O(1) to O(n) and "more complicated wrt
> > > wraparound".
> > >
> > > /*
> > > * Array of committed transactions that have modified the catalog.
> > > *
> > > * As this array is frequently modified we do *not* keep it in
> > > * xidComparator order. Instead we sort the array when building &
> > > * distributing a snapshot.
> > > *
> > > * TODO: It's unclear whether that reasoning has much merit. Every
> > > * time we add something here after becoming consistent will also
> > > * require distributing a snapshot. Storing them sorted would
> > > * potentially also make it easier to purge (but more complicated wrt
> > > * wraparound?). Should be improved if sorting while building the
> > > * snapshot shows up in profiles.
> > > */
> > > TransactionId *xip;
> > > } committed;
> > >
> > > /*
> > > * Keep track of a new catalog changing transaction that has committed.
> > > */
> > > static void
> > > SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
> > > {
> > > Assert(TransactionIdIsValid(xid));
> > >
> > > if (builder->committed.xcnt == builder->committed.xcnt_space)
> > > {
> > > builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
> > >
> > > elog(DEBUG1, "increasing space for committed transactions to %u",
> > > (uint32) builder->committed.xcnt_space);
> > >
> > > builder->committed.xip = repalloc(builder->committed.xip,
> > > builder->committed.xcnt_space * sizeof(TransactionId));
> > > }
> > >
> > > /*
> > > * TODO: It might make sense to keep the array sorted here instead of
> > > * doing it every time we build a new snapshot. On the other hand this
> > > * gets called repeatedly when a transaction with subtransactions commits.
> > > */
> > > builder->committed.xip[builder->committed.xcnt++] = xid;
> > > }
> > >
> > > It might be worth profiling this function to evaluate whether
> > > maintaining a sorted array could bring potential benefits, although
> > > accurately measuring its end-to-end impact could be difficult if it
> > > isn’t a known hotspot. I also did a brief search on the mailing list
> > > and found no reports of performance concerns or related proposals to
> > > optimize this part of the code.
> >
> > It might also be worth researching what kind of workloads would need a
> > better algorithm in terms of storing/updating xip and subxip arrays
> > since it would be the primary motivation. Also, otherwise we cannot
> > measure the real-world impact of a new algorithm. Having said that,
> > find that it would be discussed and developed separately from the
> > proposed patch on this thread.
>
> +1 for researching the workloads that might need a sorted array and
> more efficient algorithm. This exploration isn’t limited to the scope
> of SnapBuildPurgeOlderTxn but relates more broadly to the overall
> snapbuild process, which might be worth discussing in a separate
> thread as suggested.
>
> * TODO: It's unclear whether that reasoning has much merit. Every
> * time we add something here after becoming consistent will also
> * require distributing a snapshot. Storing them sorted would
> * potentially also make it easier to purge (but more complicated wrt
> * wraparound?). Should be improved if sorting while building the
> * snapshot shows up in profiles.
>
> I also constructed an artificial workload to try to surface the qsort
> call in SnapBuildBuildSnapshot, though such a scenario seems very
> unlikely to occur in production.
>
> for ((c=1; c<=DDL_CLIENTS; c++)); do
> (
> local seq=1
> while (( $(date +%s) < tB_end )); do
> local tbl="hp_ddl_${c}_$seq"
> "$psql" -h 127.0.0.1 -p "$PORT" -d postgres -c "
> BEGIN;
> CREATE TABLE ${tbl} (id int, data text);
> CREATE INDEX idx_${tbl} ON ${tbl} (id);
> INSERT INTO ${tbl} VALUES ($seq, 'd');
> DROP TABLE ${tbl};
> COMMIT;" >/dev/null 2>&1 || true
> seq=$((seq+1))
> done
> )

Interesting. To be honest, I think this scenario might actually occur
in practice, especially in cases where users frequently use CREATE
TEMP TABLE ... ON COMMIT DROP.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-10-24 18:36:50 Re: Avoid handle leak (src/bin/pg_ctl/pg_ctl.c)
Previous Message Matheus Alcantara 2025-10-24 18:27:01 Re: postgres_fdw: Use COPY to speed up batch inserts