Re: Berserk Autovacuum (let's save next Mandrill)

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-16 03:53:43
Message-ID: CA+fd4k5HwScjcUbjSuZXWyU6JNENsOebktdi1ft39rJqLZ9N2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 13 Mar 2020 at 05:11, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Thu, 12 Mar 2020 at 16:28, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > > Laurenz highlighted a seemingly very valid reason that the current
> > > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
> > > use the current scale factor of 0.2, then we'll run an insert-only
> > > vacuum every 200 million rows. If those INSERTs are one per
> > > transaction then the new feature does nothing as the wraparound vacuum
> > > will run instead. Since this feature was born due to large insert-only
> > > tables, this concern seems very valid to me.
> >
> > Yeah, I understand and agree that since most people would use default
> > values we can reduce mis-configuration cases by adding separate GUCs
> > that have appropriate default values for that purpose but on the other
> > hand I'm not sure it's worth that we cover the large insert-only table
> > case by adding separate GUCs in spite of being able to cover it even
> > by existing two GUCs.
>
> In light of the case above, do you have an alternative suggestion?
>
> > If we want to disable this feature on the
> > particular table, we can have a storage parameter that means not to
> > consider the number of inserted tuples rather than having multiple
> > GUCs that allows us to fine tuning. And IIUC even in the above case, I
> > think that if we trigger insert-only vacuum by comparing the number of
> > inserted tuples to the threshold computed by existing threshold and
> > scale factor, we can cover it.
>
> So you're suggesting we drive the insert-vacuums from existing
> scale_factor and threshold? What about the 1 billion row table
> example above?

My suggestion is the initial approach proposed by Justin; comparing
the number of inserted tuples to the threshold computed by
autovacuum_vacum_threshold and autovacuum_vacuum_scale_factor in order
to trigger autovacuum. But as discussed, there is a downside; if the
number of inserted tuples are almost the same as, but a little larger
than, the number of dead tuples, we will trigger insert-only vacuum
but it's wasteful.

There is already a consensus on introducing new 2 parameters, but as
the second idea I'd like to add one (or two) GUC(s) to my suggestion,
say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio
of the number of inserted tuples for total number of tuples modified
and inserted, in order to trigger insert-only vacuum. For example,
suppose the table has 1,000,000 tuples and we set threshold = 0,
scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger
normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but
we will instead trigger insert-only autovacuum, which is a vacuum with
vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000
* 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples
aggressively. If we want to trigger insert-only vacuum only on
insert-only table we can set freeze_insert_ratio = 1.0. The down side
of this idea is that we cannot disable autovacuum triggered by the
number of inserted, although we might be able to introduce more one
GUC that controls whether to include the number of inserted tuples for
triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert =
on|off). The pros of this idea would be that we can ensure that
insert-only vacuum will run only in the case where the ratio of
insertion is large enough.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-03-16 04:12:58 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Previous Message Noah Misch 2020-03-16 03:46:47 Re: [HACKERS] WAL logging problem in 9.4.3?