| From: | vignesh C <vignesh21(at)gmail(dot)com> |
|---|---|
| To: | Nikolay Samokhvalov <nik(at)postgres(dot)ai> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com> |
| Subject: | Re: xact_rollback spikes when logical walsender exits |
| Date: | 2026-04-21 06:38:24 |
| Message-ID: | CALDaNm3oe1=gC=zJAn6Px06mFmGr+Bw83gw54uKmXpDSzkciuA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 17 Apr 2026 at 20:45, Nikolay Samokhvalov <nik(at)postgres(dot)ai> wrote:
>
> Hi hackers,
>
> There is a bug on logical-replication publishers where every decoded
> committed transaction bumps pg_stat_database.xact_rollback.
> ReorderBufferProcessTXN() ends each decoded transaction with
> AbortCurrentTransaction() for catalog cleanup; in the walsender that
> is a top-level abort, so AtEOXact_PgStat_Database(isCommit=false)
> increments the backend-local pgStatXactRollback.
>
> The counts are flushed to shared stats on walsender exit, producing
> an acute spike. Result: for production systems with SREs on call and tight
> alerting on xact_rollback, this turns routine logical-replication operations
> (disabling a subscription, dropping a slot, walsender restart) into
> false-positive pages.
>
> Reported in [1]; also experienced at GitLab [2][3][4].
>
> Attaching a simple patch that adds a backend-local flag pgStatXactSkipCounters
> in pgstat_database.c that AtEOXact_PgStat_Database() honors to skip
> the counter bump.
>
> Added TAP test that fails on master with 5/0 and passes with the patch.
>
> If there is agreement on this shape, happy to send patches for all
> supported branches. Let me know what you think.
Thanks for reporting this and for the patch the problem description
matches what I've observed as well. The current behavior could be
misleading, since these rollbacks correspond to internal decoding
cleanup rather than actual user visible transaction aborts.
Another approach could be to introduce a wrapper around
AbortCurrentTransaction(), for example
AbortCurrentTransactionWithoutUpdateStats(), that skips the
AtEOXact_PgStat() call in this case.
Thoughts?
Regards,
Vignesh
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John Naylor | 2026-04-21 06:45:46 | Re: [PATCH] Fix duplicate errmsg in ALTER TABLE SPLIT PARTITION |
| Previous Message | David G. Johnston | 2026-04-21 06:30:40 | Re: Add \pset options for boolean value display |