Quick Links

Re: Vacuum statistics

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, a(dot)lepikhov(at)postgrespro(dot)ru, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Vacuum statistics
Date:	2025-09-04 16:18:21
Message-ID:	79490c8c-2c35-464d-bb3e-3d2ea5aca9ab@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

To be honest, I haven’t provided extensions for the PostgreSQL [0] to
hackers yet, nor have I encountered this situation in general. Just in
case, I created an open repository on GitHub with the code and added a
description in the README.

[0] https://github.com/Alena0704/vacuum_statistics#

On 04.09.2025 18:49, Alena Rybakina wrote:
> Hi, all!
>
> On 02.06.2025 19:50, Alena Rybakina wrote:
>>
>> On 02.06.2025 19:25, Alexander Korotkov wrote:
>>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>> On 12.05.2025 08:30, Amit Kapila wrote:
>>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina
>>>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>>>> I did a rebase and finished the part with storing statistics
>>>>>> separately from the relation statistics - now it is possible to
>>>>>> disable the collection of statistics for relationsh using gucs and
>>>>>> this allows us to solve the problem with the memory consumed.
>>>>>>
>>>>> I think this patch is trying to collect data similar to what we do
>>>>> for
>>>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>>>> idea such that these additional statistics will be collected once
>>>>> some
>>>>> external module like pg_stat_statements is enabled? That module
>>>>> should
>>>>> be responsible for accumulating and resetting the data, so we won't
>>>>> have this memory consumption issue.
>>>> The idea is good, it will require one hook for the
>>>> pgstat_report_vacuum
>>>> function, the extvac_stats_start and extvac_stats_end functions can be
>>>> run if the extension is loaded, so as not to add more hooks.
>>> +1
>>> Nice idea of a hook. Given the volume of the patch, it might be a
>>> good idea to keep this as an extension.
>> Okay, I'll realize it and apply the patch)
>>>
>>>> But I see a problem here with tracking deleted objects for which
>>>> statistics are no longer needed. There are two solutions to this and I
>>>> don't like both of them, to be honest.
>>>> The first way is to add a background process that will go through the
>>>> table with saved statistics and check whether the relation or the
>>>> database are relevant now or not and if not, then
>>>> delete the vacuum statistics information for it. This may be
>>>> resource-intensive. The second way is to add hooks for deleting the
>>>> database and relationships (functions dropdb, index_drop,
>>>> heap_drop_with_catalog).
>>> Can we workaround this with object_access_hook?
>>
>> I think this could fix the problem. For the OAT-DROP access type, we
>> can call a function to reset the vacuum statistics for relations that
>> are about to be dropped.
>>
>> At the moment, I don’t see any limitations to using this approach.
>>
> I’ve prepared the first working version of the extension.
>
> I haven’t yet implemented writing the statistics to a file and
> reloading them into a hash table and shared memory at instance
> startup, and I also haven’t implemented a proper output for
> database-level statistics yet.
>
> I structured the extension as follows: statistics are stored in a hash
> table keyed by a composite key - database OID, relation OID, and
> object type (index, table, or database). When VACUUM or a worker
> processes a table or index, an exclusive lock is taken to update the
> corresponding record; a shared lock is taken when reading the
> statistics. For database-level output, I plan to compute the totals by
> summing table and index statistics on demand.
>
> To optimize that, I plan to keep entries in the hash table ordered by
> database OID. When accessing the first element by the partial key
> (database OID), I’ll scan forward and aggregate until the partitial
> database key changes.
>
> Right now this requires adding the extension to
> `shared_preload_libraries`. I haven’t found a way to avoid that
> because of shared-memory setup, and I’m not sure it’s even possible.
>
> I’m also unsure whether it’s better to store the statistics in the
> cumulative statistics system (as done here) or entirely inside the
> extension. Note that the code added to the core to support the
> extension executes regardless of whether the extension is enabled.

In response to

Re: Vacuum statistics at 2025-09-04 15:49:20 from Alena Rybakina

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-09-04 16:29:12	Re: Solaris compiler status
Previous Message	Fujii Masao	2025-09-04 16:16:14	pg_recvlogical: Prevent flushed data from being re-sent after restarting replication