Re: Count and log pages set all-frozen by vacuum

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Count and log pages set all-frozen by vacuum
Date: 2024-12-13 02:39:22
Message-ID: 420ae636-04c6-4c31-b26a-1f55c72e8d6d@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/11/24 20:18, Masahiko Sawada wrote:
>
> ...
>
>> Here's an example to exercise the new log message:
>>
>> create table foo (a int, b int) with (autovacuum_enabled = false);
>> insert into foo select generate_series(1,1000), 1;
>> delete from foo where a > 500;
>> vacuum (verbose) foo;
>> -- visibility map: 5 pages newly set all-visible, of which 2 set
>> all-frozen. 0 all-visible pages newly set all-frozen.
>> insert into foo select generate_series(1,500), 1;
>> vacuum (verbose, freeze) foo;
>> --visibility map: 3 pages newly set all-visible, of which 3 set
>> all-frozen. 2 all-visible pages newly set all-frozen.
>
> The output makes sense to me. The patch mostly looks good to me. Here
> is one minor comment:
>
> if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
> (old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
>
> Maybe it would be better to rewrite it to:
>
> if ((old_vmbits & (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)) == 0)
>
> Regards,
>

I agree the v4 patch is fine, although I find the wording with multiple
"newly" a bit verbose / confusing. Maybe like this would be better:

%u pages set all-visible, %u set all-frozen (%u were all-visible)

I don't want to drag this thread into an infinite discussion about the
best possible wording, so if others think the v4 wording is fine, I
won't object to it.

For me the bigger questions is whether these new counters added to te
vacuum log message are actually useful in practice. I understand it may
be useful while working on a patch related to eager freezing (although I
think Melanie changed the approach of that patch series, and it doesn't
actually require this patch anymore).

But I'm a bit skeptical about it being useful for regular users or DBAs.
I certainly don't remember me wanting to know these values per-vacuum.
Of course, maybe that's bias - knowing it's not available and thus not
asking for it. But I think it's also very hard to make conclusions about
the "freeze debt" from these per-vacuum values - we don't know how the
values combine. It might be disjunct set of pages (and then we should
just sum them), or maybe it's the same set of pages over and over again
(and then the debt doesn't really change).

It doesn't mean it's useless - e.g. we might compare the sum to a delta
of values from visibility_map_summary() and make some deductions about
how often are pages set all-visible repeatedly. And maybe vacuum should
log those before/after visibility_count values too, to make this easier.
Not sure how costly would that be ...

To me these visibility_count seem more important when quantifying the
freeze debt for a given table. But I think that also needs to consider
how old those all-visible pages are - the older the more it contributes
to the debt, I think. And that won't be in the vacuum log, of course.

Another thing is that analyzing vacuum log messages is ... not very
easy. Having to parse multi-line log messages, with a mix of text and
fields (not even mentioning translations) is not fun. Of course, none of
that is a fault of this patch, and I don't expect this patch to fix
that. But it's hard to get excited about new fields added to this log
message, when it'd be most useful aggregated for vacuums over some time
interval. I really wish we had some way to collect and access these
runtime stats in a structured way.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-12-13 02:51:58 Re: Add Postgres module info
Previous Message Michael Paquier 2024-12-13 02:02:53 Re: per backend I/O statistics