Quick Links

Re: Improve pg_stat_statements scalability

From:	Sami Imseih <samimseih(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, lukas(at)fittl(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improve pg_stat_statements scalability
Date:	2026-06-12 23:21:42
Message-ID:	CAA5RZ0s5z2=8fZ3FMG9fs32JctuLX-WV4jJ5X_w04MMihw9Q-A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Sorry for the delay, and thank you for the comments!

> On Wed, Jun 03, 2026 at 05:10:39PM +0900, Kyotaro Horiguchi wrote:
> > One alternative would be to introduce a separate callback for anytime
> > flushes. However, since transaction-boundary flushes ultimately need
> > to flush everything anyway, it seems to me that passing a context flag
> > would be sufficient.
>
> Nah. I am not really in favor of an extra callback. I feel that this
> could lead to more mistakes when implementing new stats kinds, so
> doing your approach of using a value based on the context of the flush
> works for me.

I will proceed with this approach.

> > I tend to agree with the single-entry-point approach.
> >
> > The basic operation is common; the anytime case differs only in the
> > context in which the flush is requested. The flush-timing logic in
> > pgstat_report_stat() may become slightly more complicated with anytime
> > flushes, but after that point I think the existing path should
> > continue to work with only minor adjustments, by carrying an explicit
> > context value down to the callbacks.
>
> Yeah, same feeling here. That would be the way I'd use to approach
> this design, and to aim for one single entry point for all the stats
> reports and the flushes, for all the contexts.

We can keep the scope of the work for now limited to the testing
infrastructure and allow pg_stat_force_next_flush() to call
pgstat_report_stat() while within a transaction. This will not require
throttling.

> 0004: pg_stat_statements: modernize entry storage with pgstat kind

> The main patch. Replaces ShmemInitHash with dshash via DSM registry,

v3-0004 introduces a secondary dshash meant to track the query
dsa_pointer, but its larger purpose is to avoid having
pg_stat_statements_internal or eviction do a full scan on the
core dshash to look up its own entries; which means filtering
out entries for other kinds.

After discussing this approach with several people at recent
conferences (Lukas Fittl and Robert Haas), I think we should do
something more robust in the core stats system instead.

I have implemented an approach, which I will share in the next
revision, where every stats kind is written to its own dedicated
dshash rather than into the common one. This is a new
configuration in PgStat_KindInfo where an extension (or even a
core stats kind) can request to be placed in a dedicated hash,
else in the common hash.

I like this better, especially for kinds with unique access
patterns like pg_stat_statements, where the view function and
eviction need to iterate over all their own entries frequently.
A dedicated hash eliminates the full-scan problem without a
secondary coordination dshash.

The implementation is straightforward and does not require
significant API changes.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Re: Improve pg_stat_statements scalability at 2026-06-04 04:50:59 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Khoa Nguyen	2026-06-12 23:47:56	Re: [PATCH] vacuumdb: Add --exclude-database option
Previous Message	Tom Lane	2026-06-12 23:21:28	Re: Use \if/\endif to remove non-libxml2 expected output in regression tests