| From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Andrei Lepikhov <lepihov(at)gmail(dot)com> |
| Subject: | Re: Vacuum statistics |
| Date: | 2026-06-18 03:32:27 |
| Message-ID: | a9f3cf56-b44b-4e98-aa6d-26fd431a772d@yandex.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, Alexander! Thank you for the careful review. All five points are
addressed in the attached,
reworked patch set (now split into 8 per-category patches).
On 16.06.2026 20:30, Alexander Korotkov wrote:
> Some notes about it.
> 1) PgStat_CommonCounts.interrupts_count is never incremented.
Fixed. It's now incremented from vacuum_error_callback() (when
geterrlevel() == ERROR and a relation is set) via the new
pgstat_report_vacuum_error(), which bumps interrupts_count in the
PGSTAT_KIND_VACUUM_DB shmem entry. It writes
directly to shared memory because the vacuum's transaction is aborting,
and is reported at the database level
(pg_stat_vacuum_database). New TAP test 015_vacuum_stats_interrupts.pl
cancels a running VACUUM and checks
the counter advances.
(patch 0004)
> 2) LVRelState.wraparound_failsafe_count can only be incremented once
> in lazy_check_wraparound_failsafe(). Should we replace it with a
> bool?
Done. LVRelState.wraparound_failsafe is now a bool, set to true in
lazy_check_wraparound_failsafe(). The per-relation view reports it as a
0/1 flag reflecting
the latest vacuum, and the per-database aggregate sums those flags into
a count. This metric
is covered by 005_vacuum_stats_failsafe.pl (xid_wraparound module),
which burns enough
XIDs to engage the failsafe and asserts the counters advance.
(patch 0004)
> 3) Vacuum statistics isn't counted correctly for parallel index
> vacuum. parallel_vacuum_process_one_index() doesn't call neither
> extvac_stats_start_idx(), extvac_stats_end_idx(), or
> extvac_accumulate_idx_repor(). This leads all index vacuum stats to
> go the heap stats. I also think you need a reliable test cases to
> cover issues like this.
Fixed. parallel_vacuum_process_one_index() now samples each index pass with
extvac_stats_start_idx() / extvac_stats_end_idx() and accumulates it via
extvac_accumulate_idx_report() into the index's DSM-resident totals. The
leader reports those
totals once per index in parallel_vacuum_end() and feeds them back
through idx_heap_total, so
the per-index buffer/WAL/timing usage is subtracted from the parent heap
instead of being charged to it.
For the reliable test, I added a TAP test
(src/test/modules/test_misc/t/016_vacuum_stats_parallel.pl).
It forces the parallel path and verifies it was actually taken by
checking the VACUUM (VERBOSE) output for a launched
parallel worker. It then vacuums two identical tables - one with
PARALLEL 2, one with PARALLEL 0 - and asserts that
the per-index statistics (tuples_deleted, pages_deleted, wal_records)
are identical between the two paths,
which is exactly the invariant that was broken before: the parallel path
now neither loses the index stats nor
charges them to the heap.
(machinery + parallel sampling and the parallel test in patch 0003, "WAL
metrics and the sampling machinery")
> 4) It seems that pgstat_vacuum_db_flush_cb() and
> pgstat_vacuum_relation_flush_cb() are unused as
> pgstat_report_vacuum_extstats() reports directly to shmem. What was
> your intention here?
Good catch! Fixed! pgstat_report_vacuum_extstats() now accumulates into
pending entries with
pgstat_prep_pending_entry() for both PGSTAT_KIND_VACUUM_RELATION and
PGSTAT_KIND_VACUUM_DB,
which pgstat_report_stat() then flushes through the registered
pgstat_vacuum_relation_flush_cb() /
pgstat_vacuum_db_flush_cb().
(relation pending in patch 0001; database pending in patch 0003)
> 5) Commit message of 0001 uses stale names
> rev_all_visible_pages/rev_all_frozen_pages counters for
> visible_page_marks_cleared` / `frozen_page_marks_cleared.
Fixed.
A few other changes in this version, beyond the five points above:
* I split the patch set so that each commit introduces one category of
statistics.
In particular, recently_dead_tuples now lives in the same commit as
missed_dead_tuples (patch 0003), where
it logically belongs - both are dead-tuple counters derived the same
way, so keeping them together makes
the series easier to review.
* I also changed the order in which the counters are added across the
commits. I'm still working out the right way
to verify the buffer statistics, timings and he frozen /
visibility-map counters.
Thanks again for the review.
--
-----------
Best regards,
Alena Rybakina,
Yandex Cloud
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Track-table-VM-stability.patch | text/plain | 21.7 KB |
| 0002-Extended-vacuum-statistics-core-heap-and-tuple-metri.patch | text/plain | 45.9 KB |
| 0003-Extended-vacuum-statistics-recently-dead-and-missed-.patch | text/plain | 15.6 KB |
| 0004-Extended-vacuum-statistics-WAL-metrics-and-the-sampl.patch | text/plain | 72.5 KB |
| 0005-Extended-vacuum-statistics-interrupted-vacuums-and-t.patch | text/plain | 27.5 KB |
| 0006-Extended-vacuum-statistics-visibility-map-page-trans.patch | text/plain | 13.7 KB |
| 0007-Extended-vacuum-statistics-total-shared-buffer-acces.patch | text/plain | 29.1 KB |
| 0008-Extended-vacuum-statistics-per-relation-buffer-acces.patch | text/plain | 17.8 KB |
| 0009-Extended-vacuum-statistics-timing-metrics-for-tables.patch | text/plain | 38.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2026-06-18 03:58:35 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Chao Li | 2026-06-18 03:24:04 | pgbench --continue-on-error: clarify TPS and failure reporting |