Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Benoit Tigeot <benoit(dot)tigeot(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: Add `calls_aborted` counter for tracking query cancellations
Date: 2025-08-14 08:18:44
Message-ID: aJ2b5ImfTEQgLZOg@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 12, 2025 at 04:54:10PM +0200, Benoit Tigeot wrote:
> I deliberately focused `calls_aborted` on executor-level failures
> rather than earlier phases (parser/analyzer/rewriter/planner) because
> they serve different operational purposes. Earlier phase failures are
> typically development-time errors (syntax mistakes, missing tables,
> type mismatches) that don't provide actionable operational insights.
> Executor aborts represent runtime operational issues (query timeouts,
> cancellations, resource exhaustion, lock conflicts, etc.) that
> indicate performance degradation or capacity problems requiring
> attention. This design keeps the feature focused on what matters for
> production monitoring: distinguishing between queries that "worked
> before but now fail operationally" versus "never worked due to code
> bugs." The implementation is also cleaner, avoiding the complexity of
> hooking multiple subsystems and classifying different error types but
> of course I may be wrong. ;)

That seems kind of limited to me in scope. The executor is only one
part of the system. I would have considered using an xact callback
when a transaction is aborted if I were to do a patch like the one you
are proposing, to know how many times a transaction is failing at a
specific phase, because you should know the latest query_id in this
case to be able to put a counter update in the correct slot (right?).

+-- aborted calls tracking
+SELECT pg_sleep(0.5);
+ pg_sleep
+----------
+
+(1 row)

Using hardcoded sleep times for deterministic tests is never a good
idea. On fast machines, they eat time for nothing. And if not
written correctly, they may not achieve their goal on slow machines
because the sleep threshold may be reached before the custom action is
taken. If you want to force a failure, you should just use a SQL that
you know would fail at execution time (based on your implementation
expects).
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim.Melnikov 2025-08-14 08:26:18 Re: Panic during xlog building with big values
Previous Message Ashutosh Bapat 2025-08-14 07:37:58 Re: Import Statistics in postgres_fdw before resorting to sampling.