Re: New GUC autovacuum_max_threshold ?

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: New GUC autovacuum_max_threshold ?
Date: 2024-04-25 16:51:45
Message-ID: CAAKRu_ZoieMN3abH9gjMRmRMf6M3f1n4x-U9LsfE02tJnnbN1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 25, 2024 at 2:52 AM Frédéric Yhuel
<frederic(dot)yhuel(at)dalibo(dot)com> wrote:
>
> Le 24/04/2024 à 21:10, Melanie Plageman a écrit :
> > On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
> > <frederic(dot)yhuel(at)dalibo(dot)com> wrote:
> >>
> >> Hello,
> >>
> >> I would like to suggest a new parameter, autovacuum_max_threshold, which
> >> would set an upper limit on the number of tuples to delete/update/insert
> >> prior to vacuum/analyze.
> >
> > Hi Frédéric, thanks for the proposal! You are tackling a very tough
> > problem. I would also find it useful to know more about what led you
> > to suggest this particular solution. I am very interested in user
> > stories around difficulties with what tables are autovacuumed and
> > when.
> >
>
> Hi Melanie! I can certainly start compiling user stories about that.

Cool! That would be very useful.

> >> The idea would be to replace the following calculation :
> >>
> >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
> >>
> >> with this one :
> >>
> >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
> >> + vac_scale_factor * reltuples / autovacuum_max_threshold)
> >>
> >> (and the same for the others, vacinsthresh and anlthresh).
> >
> > My first thought when reviewing the GUC and how it is used is
> > wondering if its description is a bit misleading.
> >
> > autovacuum_vacuum_threshold is the "minimum number of updated or
> > deleted tuples needed to trigger a vacuum". That is, if this many
> > tuples are modified, it *may* trigger a vacuum, but we also may skip
> > vacuuming the table for other reasons or due to other factors.
> > autovacuum_max_threshold's proposed definition is the upper
> > limit/maximum number of tuples to insert/update/delete prior to
> > vacuum/analyze. This implies that if that many tuples have been
> > modified or inserted, the table will definitely be vacuumed -- which
> > isn't true. Maybe that is okay, but I thought I would bring it up.
> >
>
> I'm not too sure I understand. What are the reasons it might by skipped?
> I can think of a concurrent index creation on the same table, or
> anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the
> sort of thing you are talking about?

No, I was thinking more literally that, if reltuples (assuming
reltuples is modified/inserted tuples) > autovacuum_max_threshold, I
would expect the table to be vacuumed. However, with your formula,
that wouldn't necessarily be true.

I think there are values of reltuples and autovacuum_max_threshold at
which reltuples > autovacuum_max_threshold but reltuples <=
vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor
* reltuples / autovacuum_max_threshold)

I tried to reduce the formula to come up with a precise definition of
the range of values for which this is true, however I wasn't able to
reduce it to something nice.

Here is just an example of a case:

vac_base_thresh = 2000
vac_scale_factor = 0.9
reltuples = 3200
autovacuum_max_threshold = 2500

total_thresh = vac_base_thresh + vac_scale_factor * reltuples / (1 +
vac_scale_factor * reltuples / autovacuum_max_threshold)

total_thresh: 3338. dead tuples: 3200. autovacuum_max_threshold: 2500

so there are more dead tuples than the max threshold, so it should
trigger a vacuum, but it doesn't because the total calculated
threshold is higher than the number of dead tuples.

This of course may not be a realistic scenario in practice. It works
best the closer scale factor is to 1 (wish I had derived the formula
successfully) and when autovacuum_max_threshold > 2 * vac_base_thresh.
So, maybe it is not an issue.

> Perhaps a better name for the GUC would be
> autovacuum_asymptotic_limit... or something like that?

If we keep the asymptotic part, that makes sense. I wonder if we have
to add another "vacuum" in there (e.g.
autovacuum_vacuum_max_threshold) to be consistent with the other gucs.
I don't really know why they have that extra "vacuum" in them, though.
Makes the names so long.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-25 17:12:39 Re: Why don't we support external input/output functions for the composite types
Previous Message Jeff Davis 2024-04-25 16:41:08 Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM