Re: Improve pg_stat_statements scalability

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: samimseih(at)gmail(dot)com, lukas(at)fittl(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improve pg_stat_statements scalability
Date: 2026-06-02 06:30:40
Message-ID: ah54kGcap-VuWd7y@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 01, 2026 at 01:58:58PM +0900, Kyotaro Horiguchi wrote:
> As far as I can see, the current implementation does not apply any
> throttling to calls of this API. In theory, a large number of backends
> could invoke it frequently and generate a high flush rate. For
> example, if 1000 backends were to call it once per second, that would
> result in 1000 shared-stats updates per second.
>
> Whether such a usage pattern would occur in practice is a separate
> question. However, given that existing pgstat code uses
> PGSTAT_MIN_INTERVAL to limit flush frequency, it seems to me that
> anytime stats should probably retain a similar restriction.

Hmm, OK. One cost in this decision is that it could randomly make the
tests randomly slower, which is one reason why this patch has been
added to this thread.

> Historically, flushing and freeing an entry were effectively the same
> decision because flushing only happened after transaction end. With
> anytime flushes, however, those become separate concerns. A callback
> may be able to flush everything that is appropriate for the current
> flush context, while the caller may still need to keep the entry
> around for later transaction-end processing.

Noted.

> That makes it hard to reason about checks such as
> pg_memory_is_all_zeros(&lstats->counts, ...). This check still appears
> to tell whether the counters themselves are zero, but the proposed
> change makes it look as if that is no longer enough to determine whether
> the entry is "empty", because entry lifetime is also folded into the
> callback result.
>
> I wonder whether it would be cleaner for the caller to make the lifetime
> decision, based on the kind of flush being performed, and for the
> callback to be told that flush context explicitly. For example, the
> callback could be passed whether this is an anytime flush or a
> transaction-boundary flush, flush the counters that are appropriate for
> that context, and then return whether any counters remain.

So you would suggest to extend the flush callback(s) with a context
value instead of having each flush callback do their own decisions
depending on if we are in a transaction block or not, right?

Another question that I would have for you is: do you think that we
should try to keep pgstat_report_stat() as the sole entry point for
the flush of the stats? Or should we try to keep the same approach as
what this patch is doing with a new routine that loops through the
flush callbacks? FWIW, I am kind of finding the approach of having a
single entry point rather than two as more appealing with the
long-term picture in mind. That's just a single take, where we could
just pgstat_report_stat() an extra argument based on the context where
it is called, lifting its current !IsTransactionOrTransactionBlock()
requirement.

As you are the original author of this area of this code, I'd be
really interested in your opinion here.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-06-02 06:39:21 Re: Fix bug of CHECK constraint enforceability recursion
Previous Message Ashutosh Bapat 2026-06-02 06:28:07 Re: A few message wording/formatting cleanup patches