Quick Links

Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array

From:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Date:	2025-10-20 11:08:50
Message-ID:	CABPTF7VtK80J2uVayacWqjqzOwnFdFMgmvc5VUp5F3LqMyFWSA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Kirill Reshke <reshkekirill(at)gmail(dot)com> 于 2025年10月20日周一下午6:07写道：

> On Mon, 20 Oct 2025 at 13:47, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Mon, Oct 20, 2025 at 11:36 AM Kirill Reshke <reshkekirill(at)gmail(dot)com>
> wrote:
> > >
> > > On Mon, 20 Oct 2025 at 08:08, Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> wrote:
> > > >
> > > > Hi, thanks for looking into this.
> > > >
> > > > On Sat, Oct 18, 2025 at 4:59 PM Kirill Reshke <
> reshkekirill(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Sat, 18 Oct 2025 at 12:50, Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> wrote:
> > > > > >
> > > > > > Hi Hackers,
> > > > >
> > > > > Hi!
> > > > >
> > > > > > The SnapBuildPurgeOlderTxn function previously used a suboptimal
> > > > > > method to remove old XIDs from the committed.xip array. It
> allocated a
> > > > > > temporary workspace array, copied the surviving elements into
> it, and
> > > > > > then copied them back, incurring unnecessary memory allocation
> and
> > > > > > multiple data copies.
> > > > > >
> > > > > > This patch refactors the logic to use a standard two-pointer,
> in-place
> > > > > > compaction algorithm. The new approach filters the array in a
> single
> > > > > > pass with no extra memory allocation, improving both CPU and
> memory
> > > > > > efficiency.
> > > > > >
> > > > > > No behavioral changes are expected. This resolves a TODO comment
> > > > > > expecting a more efficient algorithm.
> > > > > >
> > > > >
> > > > > Indeed, these changes look correct.
> > > > > I wonder why b89e151054a0 did this place this way, hope we do not
> miss
> > > > > anything here.
> > > >
> > > > I think this small refactor does not introduce behavioral changes or
> > > > breaks given constraints.
> > > >
> > > > > Can we construct a microbenchmark here which will show some
> benefit?
> > > > >
> > > >
> > > > I prepared a simple microbenchmark to evaluate the impact of the
> > > > algorithm replacement. The attached results summarize the findings.
> > > > An end-to-end benchmark was not included, as this function is
> unlikely
> > > > to be a performance hotspot in typical decoding workloads—the array
> > > > being cleaned is expected to be relatively small under normal
> > > > operating conditions. However, its impact could become more
> noticeable
> > > > in scenarios with long-running transactions and a large number of
> > > > catalog-modifying DML or DDL operations.
> > > >
> > > > Hardware:
> > > > AMD EPYC™ Genoa 9454P 48-core 4th generation
> > > > DDR5 ECC reg
> > > > NVMe SSD Datacenter Edition (Gen 4)
> > > >
> > > > Best,
> > > > Xuneng
> > >
> > > At first glance these results look satisfactory.
> > >
> > > Can you please describe, how did you get your numbers? Maybe more
> > > script or steps to reproduce, if anyone will be willing to...
> > >
> >
> > Sure. Here is a brief description of this experiential benchmark:
> >
> > 1) what Tier 1 measures
> >
> > Function under test: committed.xip purge in SnapBuild
> > (OLD=workspace+memcpy vs NEW=in-place compaction).
> >
> > Inputs:
> > Array sizes: 100, 500, 1000, 2000, 5000, 10000
> > Keep ratios: 0.9, 0.5, 0.1, 0.01
> > Distributions: scattered (Fisher–Yates shuffle), contiguous
> > Repetitions: 30 per scenario
> >
> > RNG and determinism: pg_prng_state with seed 42 per dataset ensures
> > reproducibility.
> >
> > Metrics recorded per scenario:
> > Time (ns): mean, median, p95
> > Survivors (count)
> > Memory traffic (bytes): bytes_read, bytes_written, bytes_total
> > OLD: reads = (xcnt + survivors) × sizeof(XID); writes = 2 × survivors
> > × sizeof(XID)
> > NEW: reads = xcnt × sizeof(XID); writes = survivors × sizeof(XID)
> >
> > 2) The core components
> >
> > # C Extension (snapbuild_bench.c) - The actual benchmark implementation
> >
> > The C extension contains the actual benchmark implementation that runs
> > inside the PostgreSQL backend process. It's designed to:
> > - Mimic real PostgreSQL code paths** as closely as possible
> > - Use actual PostgreSQL data structures** (`TransactionId`,
> `MemoryContext`)
> > - Call real PostgreSQL functions** (`NormalTransactionIdPrecedes`)
> > - Measure with nanosecond precision** using PostgreSQL's timing
> infrastructure
> >
> > # SQL Wrapper (snapbuild_bench--1.0.sql) - function definitions
> >
> > # Orchestration Scripts - Automated benchmark execution and analysis
> > run_snapbuild_purge_bench
> >
> > 3) Execution Flow
> >
> > 1. Extension Installation
> > # Build and install
> > export PG_CONFIG=$HOME/pg/vanilla/bin/pg_config
> > make -C contrib_extension/snapbuild_bench clean install
> > # Create extension in database
> > CREATE EXTENSION snapbuild_bench;
> >
> > 3. Run full benchmark suite
> > ./run_snapbuild_purge_bench.sh --clean --with-baseline <patch>
> >
> > 4. Data Analysis
> > # Generate plots
> > python3 plot_tier1_results.py --csv results/unit/base_unit.csv --out
> plots/
> > # Compare baseline vs patched
> > python3 compare_snapbuild_results.py vanilla/ patched/
>
> Thank you.
>
>
> > TBH, the performance improvement from this refactor is fairly
> > straightforward, and it’s unlikely to introduce regressions.
>
> Sure.
> > The
> > experimental benchmark is therefore more complex than necessary.
> > Still, I treated it as a learning exercise — an opportunity to
> > practice benchmarking methodology and hopefully to reuse some of these
> > techniques when evaluating more performance-critical paths in the
> > future. If anyone has suggestions or spots issues, I’d greatly
> > appreciate your feedback as well.
> >
> > Best,
> > Xuneng
>
> I think this patch is in committable state, should we change its CF
> entry accordingly?
>

Feel free to do so if you agree. Thanks again for your review!

Best,
Xuneng

In response to

Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array at 2025-10-20 10:06:54 from Kirill Reshke

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nikolay Shaplov	2025-10-20 11:14:29	Re: PATCH for BUG #18785: Pointer bmr.rel, dereferenced by passing as 1st parameter to function is checked for NULL later
Previous Message	Kirill Reshke	2025-10-20 10:06:54	Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array