Quick Links

Re: Vacuum statistics

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <jnasby(at)upgrade(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, a(dot)lepikhov(at)postgrespro(dot)ru, Sami Imseih <samimseih(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: Vacuum statistics
Date:	2025-09-04 15:49:20
Message-ID:	026722b5-6e61-44d8-816d-482ad24423a9@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, all!

On 02.06.2025 19:50, Alena Rybakina wrote:
>
> On 02.06.2025 19:25, Alexander Korotkov wrote:
>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> On 12.05.2025 08:30, Amit Kapila wrote:
>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina
>>>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>>>> I did a rebase and finished the part with storing statistics
>>>>> separately from the relation statistics - now it is possible to
>>>>> disable the collection of statistics for relationsh using gucs and
>>>>> this allows us to solve the problem with the memory consumed.
>>>>>
>>>> I think this patch is trying to collect data similar to what we do for
>>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>>> idea such that these additional statistics will be collected once some
>>>> external module like pg_stat_statements is enabled? That module should
>>>> be responsible for accumulating and resetting the data, so we won't
>>>> have this memory consumption issue.
>>> The idea is good, it will require one hook for the pgstat_report_vacuum
>>> function, the extvac_stats_start and extvac_stats_end functions can be
>>> run if the extension is loaded, so as not to add more hooks.
>> +1
>> Nice idea of a hook. Given the volume of the patch, it might be a
>> good idea to keep this as an extension.
> Okay, I'll realize it and apply the patch)
>>
>>> But I see a problem here with tracking deleted objects for which
>>> statistics are no longer needed. There are two solutions to this and I
>>> don't like both of them, to be honest.
>>> The first way is to add a background process that will go through the
>>> table with saved statistics and check whether the relation or the
>>> database are relevant now or not and if not, then
>>> delete the vacuum statistics information for it. This may be
>>> resource-intensive. The second way is to add hooks for deleting the
>>> database and relationships (functions dropdb, index_drop,
>>> heap_drop_with_catalog).
>> Can we workaround this with object_access_hook?
>
> I think this could fix the problem. For the OAT-DROP access type, we
> can call a function to reset the vacuum statistics for relations that
> are about to be dropped.
>
> At the moment, I don’t see any limitations to using this approach.
>
I’ve prepared the first working version of the extension.

I haven’t yet implemented writing the statistics to a file and reloading
them into a hash table and shared memory at instance startup, and I also
haven’t implemented a proper output for database-level statistics yet.

I structured the extension as follows: statistics are stored in a hash
table keyed by a composite key - database OID, relation OID, and object
type (index, table, or database). When VACUUM or a worker processes a
table or index, an exclusive lock is taken to update the corresponding
record; a shared lock is taken when reading the statistics. For
database-level output, I plan to compute the totals by summing table and
index statistics on demand.

To optimize that, I plan to keep entries in the hash table ordered by
database OID. When accessing the first element by the partial key
(database OID), I’ll scan forward and aggregate until the partitial
database key changes.

Right now this requires adding the extension to
`shared_preload_libraries`. I haven’t found a way to avoid that because
of shared-memory setup, and I’m not sure it’s even possible.

I’m also unsure whether it’s better to store the statistics in the
cumulative statistics system (as done here) or entirely inside the
extension. Note that the code added to the core to support the extension
executes regardless of whether the extension is enabled.

Attachment	Content-Type	Size
0001-Core-patch-for-vacuum-statistics.patch	text/x-patch	37.3 KB
0001-Create-vacuum-extension-statistics.patch	text/x-patch	75.1 KB

In response to

Re: Vacuum statistics at 2025-06-02 16:50:42 from Alena Rybakina

Responses

Re: Vacuum statistics at 2025-09-04 16:18:21 from Alena Rybakina

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-09-04 15:56:08	Re: [PG19-3 PATCH] Don't ignore passfile
Previous Message	Nathan Bossart	2025-09-04 15:44:00	Re: Improve LWLock tranche name visibility across backends