Re: Improve pg_stat_statements scalability

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, lukas(at)fittl(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improve pg_stat_statements scalability
Date: 2026-06-12 23:21:42
Message-ID: CAA5RZ0s5z2=8fZ3FMG9fs32JctuLX-WV4jJ5X_w04MMihw9Q-A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for the delay, and thank you for the comments!

> On Wed, Jun 03, 2026 at 05:10:39PM +0900, Kyotaro Horiguchi wrote:
> > One alternative would be to introduce a separate callback for anytime
> > flushes. However, since transaction-boundary flushes ultimately need
> > to flush everything anyway, it seems to me that passing a context flag
> > would be sufficient.
>
> Nah. I am not really in favor of an extra callback. I feel that this
> could lead to more mistakes when implementing new stats kinds, so
> doing your approach of using a value based on the context of the flush
> works for me.

I will proceed with this approach.

> > I tend to agree with the single-entry-point approach.
> >
> > The basic operation is common; the anytime case differs only in the
> > context in which the flush is requested. The flush-timing logic in
> > pgstat_report_stat() may become slightly more complicated with anytime
> > flushes, but after that point I think the existing path should
> > continue to work with only minor adjustments, by carrying an explicit
> > context value down to the callbacks.
>
> Yeah, same feeling here. That would be the way I'd use to approach
> this design, and to aim for one single entry point for all the stats
> reports and the flushes, for all the contexts.

We can keep the scope of the work for now limited to the testing
infrastructure and allow pg_stat_force_next_flush() to call
pgstat_report_stat() while within a transaction. This will not require
throttling.

> 0004: pg_stat_statements: modernize entry storage with pgstat kind

> The main patch. Replaces ShmemInitHash with dshash via DSM registry,

v3-0004 introduces a secondary dshash meant to track the query
dsa_pointer, but its larger purpose is to avoid having
pg_stat_statements_internal or eviction do a full scan on the
core dshash to look up its own entries; which means filtering
out entries for other kinds.

After discussing this approach with several people at recent
conferences (Lukas Fittl and Robert Haas), I think we should do
something more robust in the core stats system instead.

I have implemented an approach, which I will share in the next
revision, where every stats kind is written to its own dedicated
dshash rather than into the common one. This is a new
configuration in PgStat_KindInfo where an extension (or even a
core stats kind) can request to be placed in a dedicated hash,
else in the common hash.

I like this better, especially for kinds with unique access
patterns like pg_stat_statements, where the view function and
eviction need to iterate over all their own entries frequently.
A dedicated hash eliminates the full-scan problem without a
secondary coordination dshash.

The implementation is straightforward and does not require
significant API changes.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Khoa Nguyen 2026-06-12 23:47:56 Re: [PATCH] vacuumdb: Add --exclude-database option
Previous Message Tom Lane 2026-06-12 23:21:28 Re: Use \if/\endif to remove non-libxml2 expected output in regression tests