From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, a(dot)lepikhov(at)postgrespro(dot)ru, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Vacuum statistics |
Date: | 2025-09-25 15:10:12 |
Message-ID: | 74fcdd4c-5fcc-411b-8667-87a798d762e1@yandex.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
I’ve prepared an extension that adds vacuum statistics [0] (master
branch), and it’s working stably. The attached patch is a core patch
that enables this extension to work.
Right now, I’m experimenting with a core patch. Specifically, in
load_file I can detect whether vacuum_statistics is listed in
shared_preload_libraries and, if so, start collecting vacuum statistics
in the core.
However, I think it would be more reliable to simply add a dedicated
hook for vacuum statistics collection in the core. In my view, an
extension may be loaded but disabled for vacuum statistics collection -
and in that case we shouldn’t gather them.
In general, I’m not entirely happy with the current organization. One
issue that bothers me is having to scan the entire hash table to provide
vacuum statistics for a database, with aggregation.
At the moment, the hash table uses (dboid, reloid, type) as the key.
This could be improved by introducing another hash table keyed by dboid,
with entries containing arrays of the first table’s keys (dboid, reloid,
type) (where dboid is either kept or omitted).
The idea is that we find the relevant array for a given database and
then aggregate its statistics by iterating over the first table using
those keys. I’ve started implementing this approach in the main branch
of the same repository, but
I’m still working out the issues with dynamic memory management.
I also have an idea for effectively splitting statistics into “core” and
“extra.”
Core statistics:
For databases (also collected for tables and indexes): delay_time,
total_time
For tables: pages_scanned, pages_removed, tuples_deleted,
vm_new_frozen_pages, vm_new_visible_pages
For indexes: tuples_deleted, pages_deleted
Extra statistics:
For databases (also collected for tables and indexes): total_blks_read,
total_blks_dirtied, total_blks_written, blks_fetched, blks_hit,
blk_read_time, blk_write_time
For tables: recently_dead_tuples, missed_dead_tuples,
vm_new_visible_frozen_pages, missed_dead_pages, tuples_frozen
WAL statistics (separately for databases and relations):wal_records,
wal_fpi, wal_bytes
I’ve already started drafting the first implementation, but I still need
to carefully handle memory allocation.
Additionally, I’m considering letting users define which databases,
schemas, or tables/relations should have vacuum statistics collected. I
believe this could be valuable for large, high-load systems.
For example, the core statistics might show that a particular database
is frequently vacuumed - so we could then focus on tracking only that
one. Similarly, if certain tables are heavily updated by backends and
vacuumed often, we could target those specifically. Conceptually, this
would act like a filter, but at this point, it’s just an idea for a
future improvement.
This is the direction I’m planning to take with the patch. If you have
alternative ideas about how to organize the code, I’d be glad to hear them!
On 25.09.2025 03:03, Bharath Rupireddy wrote:
> Hi,
>
> On Mon, May 12, 2025 at 5:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> I did a rebase and finished the part with storing statistics separately from the relation statistics - now it is possible to disable the collection of statistics for relationsh using gucs and
>>> this allows us to solve the problem with the memory consumed.
>>>
>> I think this patch is trying to collect data similar to what we do for
>> pg_stat_statements for SQL statements. So, can't we follow a similar
>> idea such that these additional statistics will be collected once some
>> external module like pg_stat_statements is enabled? That module should
>> be responsible for accumulating and resetting the data, so we won't
>> have this memory consumption issue.
>>
>> BTW, how will these new statistics be used to autotune a vacuum? And
>> do we need all the statistics proposed by this patch?
> Thanks for working on this. I agree with the general idea of having
> minimal changes to the core. I think a simple approach would be to
> have a hook in heap_vacuum_rel at the end, where vacuum stats are
> prepared in a buffer for emitting LOG messages. External modules can
> then handle storing, rotating, interpreting, aggregating (per
> relation/per database), and exposing the stats to end-users via SQL.
> The core can define a common data structure, fill it, and send it to
> external modules. I haven't had a chance to read the whole thread or
> review the patches; I'm sure this has been discussed.
>
As for how this may help databases in practice, I think it deserves a
separate thread once the vacuum statistics patch is pushed.
In short, such statistics are essential to understand the real impact of
vacuum on system load.
For example:
If vacuum runs very frequently on a table or index, this might point to
table or index bloat, or to overly aggressive configuration.
Conversely, if vacuum freezes or removes very few tuples, it may suggest
that vacuum is not aggressive enough, or that delays are set too high.
If missed_dead_pages and missed_dead_tuples are high compared to
tuples_deleted, that may indicate vacuum can’t obtain a INDEX CLEANUP
LOCK or doesn’t retry due to long delays.
Statistics related to wraparound activity can also hint that autovacuum
settings require adjustment.
It’s also possible that this system could be made more automatic in the
future, but I haven’t fully worked out how yet. I think that discussion
belongs in a separate thread once vacuum statistics themselves are
committed.
[0] https://github.com/Alena0704/vacuum_statistics/tree/master
-------------
Best regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
vacuum_statistics.patch | text/x-patch | 38.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Banck | 2025-09-25 15:13:33 | Re: GNU/Hurd portability patches |
Previous Message | Fujii Masao | 2025-09-25 15:03:06 | Re: Suggestion to add --continue-client-on-abort option to pgbench |