Re: Berserk Autovacuum (let's save next Mandrill)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-05 17:27:49
Message-ID: 20200305172749.GK684@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 05, 2020 at 03:27:31PM +0100, Laurenz Albe wrote:
> On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote:
> > 1. I'd go for 2 new GUCs and reloptions.
> > autovacuum_vacuum_insert_scale_factor and these should work exactly
>
> I disagree about the scale_factor (and have not added it to the
> updated version of the patch). If we have a scale_factor, then the
> time between successive autovacuum runs would increase as the table
> gets bigger, which defeats the purpose of reducing the impact of each
> autovacuum run.

I would vote to include scale factor. You're right that a nonzero scale factor
would cause vacuum to run with geometrically decreasing frequency. The same
thing currently happens with autoanalyze as a table grows in size. I found
that our monthly-partitioned tables were being analyzed too infrequently
towards the end of the month. (At the beginning of the month, 10% is 2.4 hours
worth of timeseries data, but at the end of the month 10% is 3 days, which was
an issue when querying the previous day may have rowcount estimates near zero.)
If someone wanted to avoid that, they'd set scale_factor=0. I think this patch
should parallel what's already in place, and we can add documention for the
behavior if need be. Possibly scale_factor should default to zero, which I
think might make sense since insert-only tables seem to be the main target of
this patch.

> +++ b/doc/src/sgml/maintenance.sgml
> + <para>
> + Tables that have received more than
> + <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>
> + inserts since they were last vacuumed and are not eligible for vacuuming
> + based on the above criteria will be vacuumed to reduce the impact of a future
> + anti-wraparound vacuum run.
> + Such a vacuum will aggressively freeze tuples, and it will not clean up dead
> + index tuples.

"BUT will not clean .."

> +++ b/src/backend/postmaster/autovacuum.c
> + /*
> + * If the number of inserted tuples exceeds the limit

I would say "exceeds the threshold"

Thanks for working on this; we would use this feature on our insert-only
tables.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-03-05 17:34:04 Re: Psql patch to show access methods info
Previous Message Antonin Houska 2020-03-05 17:21:55 Atomics in localbuf.c