| From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com> |
| Subject: | Re: Vacuum statistics |
| Date: | 2026-04-28 05:28:45 |
| Message-ID: | 9d98562b-fa97-476b-9315-6fa2d736ab6b@yandex.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 28.04.2026 05:16, Alena Rybakina wrote:
> Hi, all!
>
> I have updated the core patch that implements the machinery for
> collecting extended vacuum statistics (I didn't touch the first patch
> that is ready for commit, only patches that are related to extension),
> and rebased the ext_vacuum_statistics extension on top of it. The
> split is intentional: the core only gathers metrics and hands them
> out, while the actual storage and SQL-level access to the statistics
> live entirely in the extension. If the extension is not loaded, the
> overhead is essentially zero - we only fill a small struct on the
> stack and do a NULL check on the hook.
>
> What was updated in the core
>
> The core gains the machinery and the hook through which the extension
> receives metrics after each vacuum.
>
> The hook. A new hook has been added in pgstat -
> set_report_vacuum_hook. It is fired once per vacuumed table and once
> per vacuumed index, plus when forming the per-database aggregate. The
> extension registers its handler in _PG_init and by default the hook is
> NULL, so without an extension the core behaves exactly as before.
>
> The set of statistics is the same as before. Common to tables, indexes
> and the database - hits and misses in shared buffers, number of
> dirtied and written pages, WAL volume, buffer read and write times,
> sleep time spent in delay points, total wall-clock vacuum time
> (including I/O and lock waits), counter of emergency anti-wraparound
> vacuums, number of interrupts and removed tuples. Tables additionally
> report frozen tuples, pages marked all-frozen / all-visible in the
> visibility map, number of scanned and removed pages, number of index
> passes, etc. Indexes report freed pages.
>
> The least obvious part of the implementation is subtracting index
> statistics from the table statistics. This is the bit worth
> highlighting. The thing is that indexes are vacuumed before the heap,
> and the buffer and WAL statistics that we capture at the heap level by
> the end of the heap vacuum already include everything that was spent
> on the indexes. If we simply expose the diff of
> pgBufferUsage/pgWalUsage between start and end, the table ends up with
> double-counted pages/WAL: once in its own report, and a second time
> inside the reports of its indexes. This is especially noticeable with
> parallel index vacuum: workers accumulate their usage in the leader
> only after they finish, so without subtraction the heap report would
> receive the combined cost of all workers as a "bonus".
>
> To handle this, as each index finishes vacuuming, its counters are
> accumulated into the state of the current operation, and at the moment
> the heap report is built these sums are subtracted out. As a result,
> the extension receives clean numbers: "this is what was actually spent
> on the table itself", and separately "this is what was actually spent
> on each index". The behaviour is idempotent for both serial and
> parallel vacuum.
>
> The ext_vacuum_statistics extension
>
> The extension registers the hook handler and stores the received data
> through the pgstat custom statistics infrastructure. That is, vacuum
> counters are kept not in the extension's own files, but together with
> the regular cumulative statistics - they survive a restart and are
> reset together with pg_stat_reset_*. Access is provided through three
> views: one for tables, one for indexes, and one with the per-database
> aggregate.
>
> Filtering
>
> This is where the main flexibility lives - the extension does not
> force "collect everything", but lets you choose both what to track and
> which metrics to keep.
>
> By object type. You can limit collection to databases only (without
> per-table detail), to tables only, or collect both. Among tables, you
> can additionally filter system / user / all.
>
> By an explicit list. An alternative to "by type" is a whitelist: you
> turn the corresponding mode on, and the extension starts collecting
> statistics only for the databases and tables that were explicitly
> registered via add_track_database / add_track_relation (with matching
> remove_* for removal). When the lists are off, the type filter is in
> effect; when they are on, only the list applies. This is convenient
> when you are interested in monitoring specific "hot" tables and do not
> want to spend memory on statistics for everything else.
> This list is persisted to disk, and there is one more non-trivial part
> here. List changes are concurrent - multiple sessions may call
> add_track_* simultaneously, plus there is an object-access hook that
> cleans the entry on DROP. To avoid ending up with a torn file, access
> to the list is serialized via a dedicated LWLock tranche (requested
> from a shmem_request_hook), and the file itself is written atomically:
> first into a temporary file, then fflush + pg_fsync + durable_rename.
> All I/O return codes are checked; on error the temporary file is
> removed and the real one is left untouched; PG_TRY/PG_CATCH guarantees
> cleanup on ereport(ERROR). Reading the list takes the same lock in
> shared mode, so a concurrent write cannot tear the load.
>
> By metric category. There is also a GUC that takes a list and turns on
> the categories of interest - buffers, WAL, general counters, timings
> (or all). Unwanted categories are simply skipped on the hook handler
> side and never make it into the pgstat entry, which reduces the
> overhead of the handler itself. This is useful when, for example, only
> timings are needed - in that case the extension does not waste time
> copying the buffer and WAL fields.
>
> Privileges. The add_track_* / remove_track_* functions require
> superuser or pg_read_all_stats. At the SQL level, EXECUTE is revoked
> from PUBLIC and granted only to pg_read_all_stats, so a regular user
> has no access to mutating the list. The views are unrestricted, like
> regular statistics.
>
> What is in the patches
>
> 0002-Machinery-for-grabbing-extended-vacuum-statistics.patch - the
> machinery in the core plus the hook.
> 0003-ext_vacuum_statistics-...patch - the extension itself, filtering,
> views, tests.
>
I noticed CI's complaints during extension installation and fixed it.
--
-----------
Best regards,
Alena Rybakina
Yandex Cloud
| Attachment | Content-Type | Size |
|---|---|---|
| v39-0001-Track-table-VM-stability.patch | text/plain | 21.7 KB |
| v39-0002-Machinery-for-grabbing-extended-vacuum-statistics.patch | text/plain | 25.0 KB |
| v39-0003-ext_vacuum_statistics-extension-for-extended-vacuum-.patch | text/plain | 145.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2026-04-28 05:46:34 | Re: StringInfo fixes, v19 edition. Plus a few oddities |
| Previous Message | Michael Paquier | 2026-04-28 05:13:31 | Re: Define DatumGetInt8 function. |