Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: klemen(dot)kobau(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Date: 2026-03-29 03:17:23
Message-ID: CABPTF7VSNTvHTpS4ga1MfLYO-=xt7mLQ0ONEFobRERnhzXSzxw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi klemen,
> >
> > Thanks for the report.
> >
> > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> > <noreply(at)postgresql(dot)org> wrote:
> > >
> > > The following bug has been logged on the website:
> > >
> > > Bug reference: 19439
> > > Logged by: klemen kobau
> > > Email address: klemen(dot)kobau(at)gmail(dot)com
> > > PostgreSQL version: 18.0
> > > Operating system: Linux (EndeavorOS)
> > > Description:
> > >
> > > I am running postgres:18.0 in a docker container, the configuration is as
> > > follows:
> > >
> > > postgres:
> > > image: postgres:18.0
> > > command: [
> > > "postgres",
> > > "-N", "200",
> > > "-c", "max_prepared_transactions=100",
> > > "-c", "wal_level=logical",
> > > "-c", "max_wal_senders=10",
> > > "-c", "max_replication_slots=20",
> > > "-c", "wal_keep_size=10",
> > > "-c", "max_slot_wal_keep_size=1024"
> > > ]
> > > environment:
> > > POSTGRES_USER: postgres
> > > POSTGRES_PASSWORD: postgres
> > > POSTGRES_DB: postgres
> > > TZ: UTC
> > > PGTZ: UTC
> > > ports:
> > > - 5432:5432
> > > volumes:
> > > - postgres_data:/var/lib/postgresql
> > >
> > > I use psql version 18.3.
> > >
> > > I run the following:
> > >
> > > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > > -- Transaction 1: insert 1 row, check stats, commit
> > > BEGIN;
> > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> > > INSERT INTO temp.xact_test (val) VALUES ('a');
> > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > 'xact_test';
> > > COMMIT;
> > >
> > > -- Transaction 2: insert 1 row, check stats
> > > BEGIN;
> > > INSERT INTO temp.xact_test (val) VALUES ('b');
> > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > 'xact_test';
> > > ROLLBACK;
> > >
> > > -- Cleanup
> > > DROP TABLE temp.xact_test;
> > > "
> > >
> > > And the output is
> > >
> > > BEGIN
> > > CREATE TABLE
> > > INSERT 0 1
> > > relname | n_tup_ins
> > > -----------+-----------
> > > xact_test | 1
> > > (1 row)
> > >
> > > COMMIT
> > > BEGIN
> > > INSERT 0 1
> > > relname | n_tup_ins
> > > -----------+-----------
> > > xact_test | 2
> > > (1 row)
> > >
> > > ROLLBACK
> > > DROP TABLE
> >
> > I can also reproduce this behavior on HEAD.
> >
> > > This is not what I would expect from reading
> > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> > > where it states
> > >
> > > > A transaction can also see its own statistics (not yet flushed out to the
> > > shared memory statistics) in the views pg_stat_xact_all_tables,
> > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > > pg_stat_xact_user_functions. These numbers do not act as stated above;
> > > instead they update continuously throughout the transaction.
> > >
> > > based on this, I would expect that the numbers are updated each time an
> > > insert happens and that they are isolated per transaction.
> > >
> >
> > This snippet of the doc feels somewhat ambiguous. The current behavior
> > does not seem fully consistent with wording such as “throughout the
> > transaction.” It also seems more appropriate for these table views to
> > reflect stats within their own scopes, rather than cumulative values
> > spanning txn boundaries.
> >
>
> I’ve looked into this issue and 'd like to propose a patch to address
> it. Feedback is very welcome.
>
> --- Root cause
>
> Since the stats subsystem was rewritten (commit 5891c7a8), each
> backend keeps a hash of pending stats entries that persist across
> transaction boundaries. Entries that flush successfully are deleted
> from the backend-local pending list; entries not flushed remain
> pending with their accumulated counters. Flushing is driven by
> pgstat_report_stat(), called from the ReadyForQuery path, and subject
> to a rate limiter.
>
> The pg_stat_xact_* views read these pending entries directly via
> find_tabstat_entry() / find_funcstat_entry(). Both functions return
> the raw accumulated counters without per-transaction scoping.
>
> The bug is deterministic when multiple top-level transactions are
> processed within a single simple-query message, because there is no
> ReadyForQuery boundary between the transactions and therefore no
> opportunity to flush and remove the pending entry.
>
> --- Fix
>
> The patch introduces a per-entry "transaction baseline" — a lazy
> snapshot of the counters taken the first time an entry is touched in
> each new top-level transaction. The accessor functions
> (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
> from the current counters, yielding only the current transaction's
> delta.
>
> The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
> pgstat_ensure_xact_baseline(), is called at every nontransactional
> counter-increment site (the 7 event-counter macros in pgstat.h and the
> 4 non-inline counter functions in pgstat_relation.c). After the first
> call per entry per transaction, the check reduces to a single integer
> comparison with a predictably not-taken branch.
>
> For function stats, a new PgStat_FunctionPending wrapper struct embeds
> PgStat_FunctionCounts at offset 0 with the baseline fields appended,
> so the flush callback requires only a trivial cast change.
>
> --- Testing
>
> The regression test is a TAP test under src/bin/psql/t/ that uses psql
> -c to send multi-statement strings as single simple-query messages.
> src/bin/psql/t look like the right existing harness for the primary
> regression because psql -c sends the whole multi-statement string as
> one simple-query message via simple_action_list dispatch. By contrast,
> ordinary pg_regress SQL files are executed by psql in file/stdin mode,
> which runs through MainLoop() and dispatches top-level statements one
> at a time via SendQuery(), creating ReadyForQuery boundaries that
> could mask the bug.
>
> The test covers three scenarios: table n_tup_ins and seq_scan
> counters, and function calls — each verifying isolation across
> consecutive top-level transactions within a single message.
>

postgres % cat output.txt
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)

COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)

ROLLBACK
DROP TABLE

I can also reproduce this behavior in v13. So this issue pre-existed
the major refactor 5891c7a8.

--
Best,
Xuneng

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2026-03-29 08:43:16 Re: BUG #19438: segfault with temp_file_limit inside cursor
Previous Message Thomas Munro 2026-03-28 07:03:14 Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free