Re: Vacuum statistics

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Subject: Re: Vacuum statistics
Date: 2026-06-18 03:32:27
Message-ID: a9f3cf56-b44b-4e98-aa6d-26fd431a772d@yandex.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Alexander! Thank you for the careful review. All five points are
addressed in the attached,
reworked patch set (now split into 8 per-category patches).

On 16.06.2026 20:30, Alexander Korotkov wrote:
> Some notes about it.
> 1) PgStat_CommonCounts.interrupts_count is never incremented.
Fixed. It's now incremented from vacuum_error_callback() (when
geterrlevel() == ERROR and a relation is set) via the new
pgstat_report_vacuum_error(), which bumps interrupts_count in the
PGSTAT_KIND_VACUUM_DB shmem entry. It writes
directly to shared memory because the vacuum's transaction is aborting,
and is reported at the database level
(pg_stat_vacuum_database). New TAP test 015_vacuum_stats_interrupts.pl
cancels a running VACUUM and checks
the counter advances.

(patch 0004)
> 2) LVRelState.wraparound_failsafe_count can only be incremented once
> in lazy_check_wraparound_failsafe(). Should we replace it with a
> bool?
Done. LVRelState.wraparound_failsafe is now a bool, set to true in
lazy_check_wraparound_failsafe(). The per-relation view reports it as a
0/1 flag reflecting
the latest vacuum, and the per-database aggregate sums those flags into
a count. This metric
is covered by 005_vacuum_stats_failsafe.pl (xid_wraparound module),
which burns enough
XIDs to engage the failsafe and asserts the counters advance.
(patch 0004)
> 3) Vacuum statistics isn't counted correctly for parallel index
> vacuum. parallel_vacuum_process_one_index() doesn't call neither
> extvac_stats_start_idx(), extvac_stats_end_idx(), or
> extvac_accumulate_idx_repor(). This leads all index vacuum stats to
> go the heap stats. I also think you need a reliable test cases to
> cover issues like this.
Fixed. parallel_vacuum_process_one_index() now samples each index pass with
extvac_stats_start_idx() / extvac_stats_end_idx() and accumulates it via
extvac_accumulate_idx_report() into the index's DSM-resident totals. The
leader reports those
totals once per index in parallel_vacuum_end() and feeds them back
through idx_heap_total, so
the per-index buffer/WAL/timing usage is subtracted from the parent heap
instead of being charged to it.

For the reliable test, I added a TAP test
(src/test/modules/test_misc/t/016_vacuum_stats_parallel.pl).
It forces the parallel path and verifies it was actually taken by
checking the VACUUM (VERBOSE) output for a launched
parallel worker. It then vacuums two identical tables - one with
PARALLEL 2, one with PARALLEL 0 - and asserts that
the per-index statistics (tuples_deleted, pages_deleted, wal_records)
are identical between the two paths,
which is exactly the invariant that was broken before: the parallel path
now neither loses the index stats nor
charges them to the heap.
(machinery + parallel sampling and the parallel test in patch 0003, "WAL
metrics and the sampling machinery")
> 4) It seems that pgstat_vacuum_db_flush_cb() and
> pgstat_vacuum_relation_flush_cb() are unused as
> pgstat_report_vacuum_extstats() reports directly to shmem. What was
> your intention here?

Good catch! Fixed! pgstat_report_vacuum_extstats() now accumulates into
pending entries with
pgstat_prep_pending_entry() for both PGSTAT_KIND_VACUUM_RELATION and
PGSTAT_KIND_VACUUM_DB,
which pgstat_report_stat() then flushes through the registered
pgstat_vacuum_relation_flush_cb() /
pgstat_vacuum_db_flush_cb().

(relation pending in patch 0001; database pending in patch 0003)
> 5) Commit message of 0001 uses stale names
> rev_all_visible_pages/rev_all_frozen_pages counters for
> visible_page_marks_cleared` / `frozen_page_marks_cleared.

Fixed.

A few other changes in this version, beyond the five points above:

* I split the patch set so that each commit introduces one category of
statistics.
In particular, recently_dead_tuples now lives in the same commit as
missed_dead_tuples (patch 0003), where
it logically belongs - both are dead-tuple counters derived the same
way, so keeping them together makes
the series easier to review.
* I also changed the order in which the counters are added across the
commits. I'm still working out the right way
to verify the buffer statistics, timings and he frozen /
visibility-map counters.

Thanks again for the review.

--
-----------
Best regards,
Alena Rybakina,
Yandex Cloud

Attachment Content-Type Size
0001-Track-table-VM-stability.patch text/plain 21.7 KB
0002-Extended-vacuum-statistics-core-heap-and-tuple-metri.patch text/plain 45.9 KB
0003-Extended-vacuum-statistics-recently-dead-and-missed-.patch text/plain 15.6 KB
0004-Extended-vacuum-statistics-WAL-metrics-and-the-sampl.patch text/plain 72.5 KB
0005-Extended-vacuum-statistics-interrupted-vacuums-and-t.patch text/plain 27.5 KB
0006-Extended-vacuum-statistics-visibility-map-page-trans.patch text/plain 13.7 KB
0007-Extended-vacuum-statistics-total-shared-buffer-acces.patch text/plain 29.1 KB
0008-Extended-vacuum-statistics-per-relation-buffer-acces.patch text/plain 17.8 KB
0009-Extended-vacuum-statistics-timing-metrics-for-tables.patch text/plain 38.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-06-18 03:58:35 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Chao Li 2026-06-18 03:24:04 pgbench --continue-on-error: clarify TPS and failure reporting