From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [Proposal] Adding callback support for custom statistics kinds |
Date: | 2025-10-23 03:33:02 |
Message-ID: | aPmh7uoW6_EmxFHd@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 22, 2025 at 03:24:11PM -0500, Sami Imseih wrote:
> I'd like to propose $SUBJECT to serialize additional per-entry data beyond
> the standard statistics entries. Currently, custom statistics kinds can store
> their standard entry data in the main "pgstat.stat" file, but there is no
> mechanism for extensions to persist extra data stored in the entry. A common
> use case is extensions that register a custom kind and, besides
> standard counters,
> need to track variable-length data stored in a dsa_pointer.
Thanks for sending a proposal in this direction.
> A concrete use case is pg_stat_statements. If it were to use custom
> stats kinds to track statement counters, it could also track query text
> stored in DSA. The callbacks allow saving the query text referenced by the
> dsa_pointer and restoring it after a clean shutdown. Since DSA
> (and more specifically DSM) cannot be attached by the postmaster, an
> extension cannot use "on_shmem_exit" or "shmem_startup_hook"
> to serialize or restore this data. This is why pgstat handles
> serialization during checkpointer shutdown and startup, allowing a single
> backend to manage it safely.
Agreed that it would be better to split the query text in a file of
its own and now bloat the "main" pgstats file with this data, A real
risk is that many PGSS entries with a bunch of queries would cause the
file to be just full of the PGSS contents.
> I considered adding hooks to the existing pgstat code paths
> (pgstat_before_server_shutdown, pgstat_discard_stats, and
> pgstat_restore_stats), but that felt too unrestricted. Using per-kind
> callbacks provides more control.
Per-kind callbacks to control all that makes sense here.
> There are already "to_serialized_name" and "from_serialized_name"
> callbacks used to store and read entries by "name" instead of
> "PgStat_HashKey", currently used by replication slot stats. Those
> remain unchanged, as they serve a separate purpose.
>
> Other design points:
>
> 1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
> This avoids requiring extensions to provide names and prevents issues
> with spaces or special characters.
Hmm. Is that really what we want here? This pretty says that one
single custom kind would never be able use multiple files, ever.
> 2. Both callbacks must be registered together. Serializing without
> deserializing would leave orphaned files behind, and I cannot think of a
> reason to allow this.
Hmm. Okay.
> 3. "write_chunk", "read_chunk", "write_chunk_s", and
> "read_chunk_s" are renamed to "pgstat_write_chunk", etc., and
> moved to "pgstat_internal.h" so extensions can use them without
> re-implementing these functions.
Exposing the write and read chunk APIs and renaming them sounds good
here, designed as they are now with a FILE* defined by the caller.
It's good to share these for consistency across custom and built-in
stats kinds.
> 4. These callbacks are valid only for custom, variable-numbered statistics
> kinds. Custom fixed kinds may not benefit, but could be considered in the
> future.
Pushing custom data for fixed-sized stats may be interesting, though
like you I am not sure what a good use-case would look like. So
discarding this case for now sounds fine to me.
> Attached 0001 is the proposed change, still in POC form.
Hmm. I would like to propose something a bit more flexible,
refactoring and reusing some of the existing callbacks, among the
following lines:
- Rather than introducing a second callback able to do more
serialization work, let's expand a bit the responsibility of
to_serialized_name and from_serialized_name to be able to work in a
more extended way, renaming them to "to/from_serialized_entry", which
are now limited to return a NameData with pgstat.c enforcing the data
written to the pgstats to be of NAMEDATALEN. The idea would be to let
the callbacks push some custom data where they want.
- The to_serialized_name path of pgstat_write_statsfile() would then
be changed as follows:
-- push a PGSTAT_FILE_ENTRY_NAME
-- Write the key write_chunk_s.
-- Call the callback to push some custom per-entry data.
-- Finish with the main chunk of data, of size pgstat_get_entry_len().
- The fd or FILE* of the "main" pgstats file should be added as
argument of both routines (not mandatory, but we are likely going to
need that if we want to add more "custom" data in the main pgstats
file before writing or reading a chunk). For example, for a PGSS text
file, we would likely write two fields to the main data file: an
offset and a length to be able to retrieve a query string, from a
secondary file.
- FDs where the data is written while we are in the to/from serialize
can be handled within the code paths specific to the stats kind code.
The first time a serialized callback of a stats kind is called, the
extra file(s) is(are) opened. This may come at the cost of one new
callback: at the end of the read and writes of the stats data, we
would need an extra look that's able to perform cleanup actions, which
would be here to make sure that the fds opened for the extra files are
closed when we are done. The close of each file is equivalent to the
pgstat_close_file() done in the patch, except that we'd loop over a
callback that would do the cleanup job once we are done reading or
writing a file. One step that can be customized in this new "end"
callback is if a stats kind may decide to unlink() a previous file, as
we do for the main pgstats file, or keep one or more files around.
That would be up to the extension developer. We should be able to
reuse or rework reset_all_cb() with a status given to it, depending on
if we are dealing with a failure or a success path. Currently,
reset_all_cb() is only used in a failure path, the idea would be to
extend it for the success case.
> The second patch
> contains tests in "injection_points" to demonstrate this proposal, and is not
> necessarily intended for commit.
Having coverage for these kinds of APIs is always good, IMO. We need
coverage for extension code.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2025-10-23 03:43:36 | Re: Issue with query_is_distinct_for() and grouping sets |
Previous Message | Peter Smith | 2025-10-23 03:28:05 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |