Re: Vacuum statistics

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Subject: Re: Vacuum statistics
Date: 2026-03-12 17:57:34
Message-ID: 164c3354-5350-4c72-a6b5-a790f50c15ce@yandex.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi)
Thank you for your valuable feedback!

On 12.03.2026 18:28, Andrei Lepikhov wrote:

> On 12/3/26 13:02, Andrei Lepikhov wrote:
>> On 9/3/26 16:46, Alena Rybakina wrote:
>>> I discovered that my last patches were incorrectly formed. I updated
>>> the correct version.
>>
>> I see that v29-0001-* is a quite separate feature itself at the
>> moment. It makes sense to remove the commit message phrase for
>> vm_new_frozen_pages and vm_new_visible_pages, introduced in later
>> patches.
>> This patch itself looks good to me.
>
> Since this patch is almost ready for commit, I reviewed it carefully.
> I noticed a documentation entry was missing, so I added it. Please see
> the attachment.
I have added it in the documentation in the extension that you have
noticed before, but I agree with your suggestion to move it in the core
patch.
> While updating the patch file, I also made a few small adjustments,
> including changing the parameter order in the struct and VIEW. The
> commit message is also fixed.
Thank you) I agree with your fixes)
>
> In addition, it makes sense to discuss how these parameters are
> supposed to be used. I see the following use cases:
>
> 1. Which tables have the most VM churn? - monitoring
> rev_all_visible_pages normalised on the table size and its average
> tuple width might expose the most suspicious tables (in terms of table
> statistics).
> 2. DML Skew. Dividing rev_all_visible_pages by the number of tuple
> updates/deletes, normalised by the average table and tuple sizes,
> might indicate whether changes are localised within the table.
> 3. IndexOnlyScan effectiveness. Considering the speed of
> rev_all_visible_pages change, normalised to the value of the
> relallvisible statistic, we may detect tables where Index-Only Scan
> might be inefficiently used.

Now it can be useful to track what table's pages are frozen by vacuum most.

By analyzing the ratio of frozen to unfrozen pages, you can see how well
the balance is maintained. Ideally, this ratio should approach 1. If we
have a higher ratio of unfrozen to frozen pages, it means the backand is
frequently accessing the table, which could indicate that this table
potentially requires attention to how well it's being handled by the
vacuum. There may be unpredictability or even a seasonal trend — a page
is frequently accessed only during certain periods (this is purely my
observation). Also, if the ratio of frozen pages is higher, the vacuum
may be configured too aggressively.

With the parameter that was included before (pg_class_relallfrozen and
relallvisible
https://github.com/MasaoFujii/postgresql/commit/99f8f3fbbc8f743290844e8c676d39dad11c5d5d)
in the pg_stat_tables, I think I can provide isolation test to prove it
- I can use my isolation test vacuum-extending-in-repetable-read.spec
that I have added in the extension (ext_vacuum_statistics). What do you
think?

>
> Feel free to criticise it or add your own - I’m just a developer, not
> a DBA. Also, I’m not sure what use cases there are for the
> rev_all_frozen_pages parameter.

Also, I would ask you if you don't mind to review the code in the
extension that I have provided to store and control vacuum statistics.
No one has ever looked at it unfortunately and any feedback is valuable
now.

In addition, I'm currently working on the parameter that can track some
parts of statistics. For example, we can track only buffer or wal
statistics. If you are interested, I'll send you the code on my github.
However I have already noticed that it requires to add dynamical memory
allocation based on the guc value. I know that it requires a lot of
attention in development but it will help to save memory during saving
statistics. What do you think about this idea? To be honest, it was
suggested before in this thread and I'm trying to realize it.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-03-12 18:09:19 Re: Add missing stats_reset column to pg_stat_database_conflicts view
Previous Message Bryan Green 2026-03-12 17:48:38 Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c)