Re: Berserk Autovacuum (let's save next Mandrill)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-10 21:32:47
Message-ID: CAApHDvpuFS62z57EHyktTdx7aoRQ-AY3ZTtXFQki=ORv5z0LTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Mar 2020 at 08:17, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>
> On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote:
> > I have one question about this patch from architectural perspective:
> > have you considered to use autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor also for this purpose? That is, we
> > compare the threshold computed by these values to not only the number
> > of dead tuples but also the number of inserted tuples. If the number
> > of dead tuples exceeds the threshold, we trigger autovacuum as usual.
> > On the other hand if the number of inserted tuples exceeds, we trigger
> > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user
> > consider the settings of newly added two parameters. We will have in
> > total 4 parameters. Amit also was concerned about that[1].
> >
> > I think this idea also works fine. In insert-only table case, since
> > only the number of inserted tuples gets increased, only one threshold
> > (that is, threshold computed by autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And
> > in mostly-insert table case, in the first place, we can trigger
> > autovacuum even in current PostgreSQL, since we have some dead tuples.
> > But if we want to trigger autovacuum more frequently by the number of
> > newly inserted tuples, we can set that threshold lower while
> > considering only the number of inserted tuples.
>
> I am torn.
>
> On the one hand it would be wonderful not to have to add yet more GUCs
> to the already complicated autovacuum configuration. It already confuses
> too many users.
>
> On the other hand that will lead to unnecessary vacuums for small
> tables.
> Worse, the progression caused by the comparatively large scale
> factor may make it vacuum large tables too seldom.

I think we really need to discuss what the default values for these
INSERT-only vacuums should be before we can decide if we need 2
further GUCc to control the feature. Right now the default is 0.0 on
the scale factor and 10 million tuples threshold. I'm not saying
those are good or bad values, but if they are good, then they're
pretty different from the normal threshold of 50 and the normal scale
factor of 0.2, therefore (assuming the delete/update thresholds are
also good), then we need the additional GUCs.

If someone wants to put forward a case for making the defaults more
similar, then perhaps we can consider merging the options. One case
might be the fact that we want INSERT-only tables to benefit from
Index Only Scans more often than after 10 million inserts.

As for pros and cons. Feel free to add to the following list:

For new GUCs/reloptions:
1. Gives users more control over this new auto-vacuum behaviour
2. The new feature can be completely disabled. This might be very
useful for people who suffer from auto-vacuum starvation.

Against new GUCs/reloptions:
1. Adds more code, documentation and maintenance.
2. Adds more complexity to auto-vacuum configuration.

As for my opinion, I'm leaning towards keeping the additional options.
I think if we were just adding auto-vacuum to core code now, then I'd
be voting to keep the configuration as simple as possible. However,
that's far from the case, and we do have over a decade of people that
have gotten used to how auto-vacuum currently behaves. Many people are
unlikely to even notice the change, but some will, and then there will
be another group of people who want to turn it off, and that group
might be upset when we tell them that they can't, at least not without
flipping the big red "autovacuum" switch into the off position (of
which, I'm pretty hesitant to recommend that anyone ever does).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-10 21:40:33 Re: shared-memory based stats collector
Previous Message Andres Freund 2020-03-10 21:32:42 Re: shared-memory based stats collector