Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: klemen kobau <klemen(dot)kobau(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Date: 2026-05-07 06:35:49
Message-ID: CABPTF7XxssS5dptcp7_AxeOAXj8Vh7cKvPJR+4PayWhZhQzDqg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi klemen,

On Wed, May 6, 2026 at 4:03 PM klemen kobau <klemen(dot)kobau(at)gmail(dot)com> wrote:
>
> Hi Xuneng,
>
> Thanks for the analysis and the patch proposals. As the original reporter, I wanted to follow up since the thread has been quiet for over a month.
>
> I am new to this process, do I need to do anything to help the process?
>
> Kind regards
>
> Klemen Kobau
>
>
> On Wed, 6 May 2026 at 09:56, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>>
>> On Sun, Mar 29, 2026 at 11:17 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>> >
>> > On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>> > >
>> > > On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>> > > >
>> > > > Hi klemen,
>> > > >
>> > > > Thanks for the report.
>> > > >
>> > > > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
>> > > > <noreply(at)postgresql(dot)org> wrote:
>> > > > >
>> > > > > The following bug has been logged on the website:
>> > > > >
>> > > > > Bug reference: 19439
>> > > > > Logged by: klemen kobau
>> > > > > Email address: klemen(dot)kobau(at)gmail(dot)com
>> > > > > PostgreSQL version: 18.0
>> > > > > Operating system: Linux (EndeavorOS)
>> > > > > Description:
>> > > > >
>> > > > > I am running postgres:18.0 in a docker container, the configuration is as
>> > > > > follows:
>> > > > >
>> > > > > postgres:
>> > > > > image: postgres:18.0
>> > > > > command: [
>> > > > > "postgres",
>> > > > > "-N", "200",
>> > > > > "-c", "max_prepared_transactions=100",
>> > > > > "-c", "wal_level=logical",
>> > > > > "-c", "max_wal_senders=10",
>> > > > > "-c", "max_replication_slots=20",
>> > > > > "-c", "wal_keep_size=10",
>> > > > > "-c", "max_slot_wal_keep_size=1024"
>> > > > > ]
>> > > > > environment:
>> > > > > POSTGRES_USER: postgres
>> > > > > POSTGRES_PASSWORD: postgres
>> > > > > POSTGRES_DB: postgres
>> > > > > TZ: UTC
>> > > > > PGTZ: UTC
>> > > > > ports:
>> > > > > - 5432:5432
>> > > > > volumes:
>> > > > > - postgres_data:/var/lib/postgresql
>> > > > >
>> > > > > I use psql version 18.3.
>> > > > >
>> > > > > I run the following:
>> > > > >
>> > > > > psql -h localhost -p 5432 -U postgres -d postgres -c "
>> > > > > -- Transaction 1: insert 1 row, check stats, commit
>> > > > > BEGIN;
>> > > > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
>> > > > > INSERT INTO temp.xact_test (val) VALUES ('a');
>> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
>> > > > > 'xact_test';
>> > > > > COMMIT;
>> > > > >
>> > > > > -- Transaction 2: insert 1 row, check stats
>> > > > > BEGIN;
>> > > > > INSERT INTO temp.xact_test (val) VALUES ('b');
>> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
>> > > > > 'xact_test';
>> > > > > ROLLBACK;
>> > > > >
>> > > > > -- Cleanup
>> > > > > DROP TABLE temp.xact_test;
>> > > > > "
>> > > > >
>> > > > > And the output is
>> > > > >
>> > > > > BEGIN
>> > > > > CREATE TABLE
>> > > > > INSERT 0 1
>> > > > > relname | n_tup_ins
>> > > > > -----------+-----------
>> > > > > xact_test | 1
>> > > > > (1 row)
>> > > > >
>> > > > > COMMIT
>> > > > > BEGIN
>> > > > > INSERT 0 1
>> > > > > relname | n_tup_ins
>> > > > > -----------+-----------
>> > > > > xact_test | 2
>> > > > > (1 row)
>> > > > >
>> > > > > ROLLBACK
>> > > > > DROP TABLE
>> > > >
>> > > > I can also reproduce this behavior on HEAD.
>> > > >
>> > > > > This is not what I would expect from reading
>> > > > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
>> > > > > where it states
>> > > > >
>> > > > > > A transaction can also see its own statistics (not yet flushed out to the
>> > > > > shared memory statistics) in the views pg_stat_xact_all_tables,
>> > > > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
>> > > > > pg_stat_xact_user_functions. These numbers do not act as stated above;
>> > > > > instead they update continuously throughout the transaction.
>> > > > >
>> > > > > based on this, I would expect that the numbers are updated each time an
>> > > > > insert happens and that they are isolated per transaction.
>> > > > >
>> > > >
>> > > > This snippet of the doc feels somewhat ambiguous. The current behavior
>> > > > does not seem fully consistent with wording such as “throughout the
>> > > > transaction.” It also seems more appropriate for these table views to
>> > > > reflect stats within their own scopes, rather than cumulative values
>> > > > spanning txn boundaries.
>> > > >
>> > >
>> > > I’ve looked into this issue and 'd like to propose a patch to address
>> > > it. Feedback is very welcome.
>> > >
>> > > --- Root cause
>> > >
>> > > Since the stats subsystem was rewritten (commit 5891c7a8), each
>> > > backend keeps a hash of pending stats entries that persist across
>> > > transaction boundaries. Entries that flush successfully are deleted
>> > > from the backend-local pending list; entries not flushed remain
>> > > pending with their accumulated counters. Flushing is driven by
>> > > pgstat_report_stat(), called from the ReadyForQuery path, and subject
>> > > to a rate limiter.
>> > >
>> > > The pg_stat_xact_* views read these pending entries directly via
>> > > find_tabstat_entry() / find_funcstat_entry(). Both functions return
>> > > the raw accumulated counters without per-transaction scoping.
>> > >
>> > > The bug is deterministic when multiple top-level transactions are
>> > > processed within a single simple-query message, because there is no
>> > > ReadyForQuery boundary between the transactions and therefore no
>> > > opportunity to flush and remove the pending entry.
>> > >
>> > > --- Fix
>> > >
>> > > The patch introduces a per-entry "transaction baseline" — a lazy
>> > > snapshot of the counters taken the first time an entry is touched in
>> > > each new top-level transaction. The accessor functions
>> > > (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
>> > > from the current counters, yielding only the current transaction's
>> > > delta.
>> > >
>> > > The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
>> > > pgstat_ensure_xact_baseline(), is called at every nontransactional
>> > > counter-increment site (the 7 event-counter macros in pgstat.h and the
>> > > 4 non-inline counter functions in pgstat_relation.c). After the first
>> > > call per entry per transaction, the check reduces to a single integer
>> > > comparison with a predictably not-taken branch.
>> > >
>> > > For function stats, a new PgStat_FunctionPending wrapper struct embeds
>> > > PgStat_FunctionCounts at offset 0 with the baseline fields appended,
>> > > so the flush callback requires only a trivial cast change.
>> > >
>> > > --- Testing
>> > >
>> > > The regression test is a TAP test under src/bin/psql/t/ that uses psql
>> > > -c to send multi-statement strings as single simple-query messages.
>> > > src/bin/psql/t look like the right existing harness for the primary
>> > > regression because psql -c sends the whole multi-statement string as
>> > > one simple-query message via simple_action_list dispatch. By contrast,
>> > > ordinary pg_regress SQL files are executed by psql in file/stdin mode,
>> > > which runs through MainLoop() and dispatches top-level statements one
>> > > at a time via SendQuery(), creating ReadyForQuery boundaries that
>> > > could mask the bug.
>> > >
>> > > The test covers three scenarios: table n_tup_ins and seq_scan
>> > > counters, and function calls — each verifying isolation across
>> > > consecutive top-level transactions within a single message.
>> > >
>> >
>> > postgres % cat output.txt
>> > BEGIN
>> > CREATE TABLE
>> > INSERT 0 1
>> > relname | n_tup_ins
>> > -----------+-----------
>> > xact_test | 1
>> > (1 row)
>> >
>> > COMMIT
>> > BEGIN
>> > INSERT 0 1
>> > relname | n_tup_ins
>> > -----------+-----------
>> > xact_test | 2
>> > (1 row)
>> >
>> > ROLLBACK
>> > DROP TABLE
>> >
>> > I can also reproduce this behavior in v13. So this issue pre-existed
>> > the major refactor 5891c7a8.
>> >
>>
>> After further thought of v1, I'd like to also propose an alternative
>> approach. The root cause and testing rationale are the same as
>> described upthread; what changes is how the baseline is established.
>>
>> --- Problems with the lazy (v1) approach
>> The v1 patch snapshots the baseline lazily -- on every
>> nontransactional counter increment, it checks whether baseline_lxid
>> matches the current
>> transaction and snapshots if not. This has two drawbacks:
>>
>> 1) Hot-path overhead
>> Every pgstat_count_heap_scan(), pgstat_count_buffer_read_block(), etc.
>> acquires a branch. After the first call per entry per transaction, the
>> branch is predictably not-taken, but it could still present on some of
>> the most frequently executed paths in the backend.
>>
>> 2) Maintenance burden
>> Every new counter added to PgStat_TableCounts requires a corresponding
>> baseline snapshot at its increment site. Miss one, and that counter
>> silently leaks across transactions
>>
>> --- eager baseline sweep
>> The attached patch records the baseline eagerly at transaction
>> boundaries instead of lazily at counter-increment sites.
>> pgstat_set_pending_baselines() iterates the pgStatPending list and
>> snapshots each entry's current counts into an xact_baseline field via
>> struct assignment. It is called from AtEOXact_PgStat() (after folding
>> transactional counts and removing dropped entries) and from
>> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
>> and PREPARE TRANSACTION. The view accessors unconditionally subtract
>> the baseline. For entries created in the current transaction,
>> xact_baseline is zero-initialized, so the subtraction is a no-op.
>>
>> I don’t have a clear preference between the two approaches; both are
>> presented for review.
>>

It would be useful to verify the fix by manually applying the patch
and building the instance. Additionally, a few issues surfaced after
looking at it again, which I will update later.

--
Best,
Xuneng

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ayush Tiwari 2026-05-07 14:44:52 Re: BUG #19418: SQL/JSON JSON_VALUE() does not conform to ISO/IEC 9075-2:2023(E) 6.34 <JSON value constructor>
Previous Message David G. Johnston 2026-05-06 12:53:58 Re: BUG #19472: CAST(-32768::SMALLINT AS REAL) fails with "SMALLINT out of range" but -32768 is valid SMALLINT value