Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: klemen kobau <klemen(dot)kobau(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Date: 2026-05-13 03:30:41
Message-ID: CABPTF7WUzwywrDmB+W=Sm=ynzQkcS83p2z-Ywt8W8XdKhXOPcQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> >> --- eager baseline sweep
> >> The attached patch records the baseline eagerly at transaction
> >> boundaries instead of lazily at counter-increment sites.
> >> pgstat_set_pending_baselines() iterates the pgStatPending list and
> >> snapshots each entry's current counts into an xact_baseline field via
> >> struct assignment. It is called from AtEOXact_PgStat() (after folding
> >> transactional counts and removing dropped entries) and from
> >> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
> >> and PREPARE TRANSACTION. The view accessors unconditionally subtract
> >> the baseline. For entries created in the current transaction,
> >> xact_baseline is zero-initialized, so the subtraction is a no-op.
> >>
> >> I don’t have a clear preference between the two approaches; both are
> >> presented for review.
> >>
>
> It would be useful to verify the fix by manually applying the patch
> and building the instance. Additionally, a few issues surfaced after
> looking at it again, which I will update later.
>

Here is the updated version using eager baseline refresh, i.e. sweeping all
backend-local pending pgstat entries at each top-level transaction boundary.

I tested the eager-baseline approach at both micro and macro levels. The
results show the same cost shape in both cases: the patch cost scales with the
number of backend-local pending pgstat entries that must be swept at each
top-level transaction boundary.

The microbenchmark isolates the transaction-boundary cost. It first creates a
controlled number of pending pgstat entries in one backend, then times 10000
tiny BEGIN/COMMIT boundaries in one simple-query message:

BEGIN; COMMIT; BEGIN; COMMIT; ...

The results below use matching -O0 --enable-debug --enable-cassert builds for
both installs.

pending entries unpatched us/xact patched us/xact patch delta
---------------------------------------------------------------------------
0 713.526 703.357 -10.169
100 740.954 754.298 +13.344
1000 1183.302 / 1177.393 1211.533 / 1213.089 about +32 us
5000 2978.008 / 2967.674 3236.365 / 3081.945
about +186 us median

Machine: Mac mini, M4 Pro, 48GB mem

This is intentionally hostile to eager sweeping: one backend accumulates many
pending entries, then repeatedly crosses top-level transaction boundaries while
doing almost no useful work inside the transactions. The added cost is small at
100 pending entries, where noise matters, but becomes clear at 1000 and 5000
entries. In this debug/cassert build, the patch adds roughly 0.03-0.04 us per
pending entry per transaction in the 1000-5000 entry range. Absolute numbers
are inflated by the build profile, but the linear shape is the relevant signal.

The macro benchmark shows when that same boundary cost is visible in a more
query-shaped workload:

workload base tps patched tps change
------------------------------------------------------------------
5000 tables, 1 row/table 11216 6231 -44%
1000 tables, 1000 rows/table 7683 7113 -7%
100 tables, 10000 rows/table 2152 2162 ~0%

The latency breakdown explains the TPS pattern. For the 5000-table/1-row case:

base: avg_lat=0.089 ms, select_avg=0.051 ms
patched: avg_lat=0.160 ms, select_avg=0.062 ms

Machine: intel xeon server, 40 cores, 128GB mem

The SELECT itself barely changes. Most of the regression appears outside the
SELECT, where the patch does the baseline sweep at transaction end. This is
the worst case for the eager design: the transaction does very little real table
work, but the backend has thousands of pending relation stats entries.

As the number of pending entries drops, or as the query does more real scan
work, the same fixed boundary cost is diluted. With 1000 tables and 1000 rows
per table, the regression falls to about 7%. With 100 tables and 10000 rows per
table, the scan dominates and the sweep over about 100 pending entries is lost
in noise.

Taken together, the benchmarks confirm the expected implementation cost model:

eager baseline refresh cost ~= O(number of pending pgstat entries per backend)

Row count does not directly drive the cost; it only hides or exposes the fixed
transaction-boundary work.

These results suggest that the eager-sweeping approach has an unfavorable
cost model for long-lived sessions that accumulate many pending stats entries
and then execute small transactions. A lazy baseline appraoch, where
each pending
entry records the current transaction generation only when that entry is first
touched, should avoid the transaction-boundary sweep and make the cost scale
with the transaction's actual working set instead. However, it still
suffers from the
potential overhead of additional comparisons on hot paths, as well as increased
maintenance pain.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachment Content-Type Size
v2-0001-Fix-pg_stat_xact_-views-leaking-across-xact-bound.patch application/octet-stream 17.7 KB
pgstat_xact_macro_bench.sh text/x-sh 7.4 KB
pgstat_xact_micro_bench.sh text/x-sh 6.5 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Previous Message Xuneng Zhou 2026-05-13 00:46:25 Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending