[Proposal] Adding callback support for custom statistics kinds

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [Proposal] Adding callback support for custom statistics kinds
Date: 2025-10-22 20:24:11
Message-ID: CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'd like to propose $SUBJECT to serialize additional per-entry data beyond
the standard statistics entries. Currently, custom statistics kinds can store
their standard entry data in the main "pgstat.stat" file, but there is no
mechanism for extensions to persist extra data stored in the entry. A common
use case is extensions that register a custom kind and, besides
standard counters,
need to track variable-length data stored in a dsa_pointer.

This proposal adds optional "to_serialized_extra" and
"from_serialized_extra" callbacks to "PgStat_KindInfo" that allow custom kinds
to write and read from extra data in a separate files
(pgstat.<kind>.stat). The callbacks
give extensions direct access to the file pointer so they can read and write
data in any format, while the core "pgstat" infrastructure manages
opening, closing, renaming, and cleanup, just as it does with "pgstat.stat".

A concrete use case is pg_stat_statements. If it were to use custom
stats kinds to track statement counters, it could also track query text
stored in DSA. The callbacks allow saving the query text referenced by the
dsa_pointer and restoring it after a clean shutdown. Since DSA
(and more specifically DSM) cannot be attached by the postmaster, an
extension cannot use "on_shmem_exit" or "shmem_startup_hook"
to serialize or restore this data. This is why pgstat handles
serialization during checkpointer shutdown and startup, allowing a single
backend to manage it safely.

I considered adding hooks to the existing pgstat code paths
(pgstat_before_server_shutdown, pgstat_discard_stats, and
pgstat_restore_stats), but that felt too unrestricted. Using per-kind
callbacks provides more control.

There are already "to_serialized_name" and "from_serialized_name"
callbacks used to store and read entries by "name" instead of
"PgStat_HashKey", currently used by replication slot stats. Those
remain unchanged, as they serve a separate purpose.

Other design points:

1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
This avoids requiring extensions to provide names and prevents issues
with spaces or special characters.

2. Both callbacks must be registered together. Serializing without
deserializing would leave orphaned files behind, and I cannot think of a
reason to allow this.

3. "write_chunk", "read_chunk", "write_chunk_s", and
"read_chunk_s" are renamed to "pgstat_write_chunk", etc., and
moved to "pgstat_internal.h" so extensions can use them without
re-implementing these functions.

4. These callbacks are valid only for custom, variable-numbered statistics
kinds. Custom fixed kinds may not benefit, but could be considered in the
future.

Attached 0001 is the proposed change, still in POC form. The second patch
contains tests in "injection_points" to demonstrate this proposal, and is not
necessarily intended for commit.

Looking forward to your feedback!

--

Sami Imseih
Amazon Web Services (AWS)

Attachment Content-Type Size
0002-test-custom-stats-callbacks.patch application/octet-stream 14.4 KB
0001-Add-callback-support-for-custom-statistics-extra-dat.patch application/octet-stream 20.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-10-22 20:57:15 Re: Skip unregistered custom kinds on stats load
Previous Message Nathan Bossart 2025-10-22 20:04:11 fix type of infomask parameter in static inline functions