Re: Berserk Autovacuum (let's save next Mandrill)

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-20 05:59:57
Message-ID: a6b9a97b4b1dfcb61b6e3ccd66566d2da005eac3.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote:
> I am doubtful it should be committed with the current settings. See below.
>
> > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
> > From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
> > Date: Thu, 19 Mar 2020 20:26:43 +0100
> > Subject: [PATCH] Autovacuum tables that have received only inserts
> >
> > This avoids the known problem that insert-only tables
> > are never autovacuumed until they need to have their
> > anti-wraparound autovacuum, which then can be massive
> > and disruptive.
>
> Shouldn't this also mention index only scans? IMO that's at least as big
> a problem as the "large vacuum" problem.

Yes, that would be good.

> I am *VERY* doubtful that the attempt of using a large threshold, and a
> tiny scale factor, is going to work out well. I'm not confident enough
> in my gut feeling to full throatedly object, but confident enough that
> I'd immediately change it on any important database I operated.
>
> Independent of how large a constant you set the threshold to, for
> databases with substantially bigger tables this will lead to [near]
> constant vacuuming. As soon as you hit 1 billion rows - which isn't
> actually that much - this is equivalent to setting
> autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
> that can be a sensible setting, but I don't think anybody would suggest
> it as a default.

In that, you are assuming that the bigger a table is, the more data
modifications it will get, so that making the scale factor the dominant
element will work out better.

My experience is that it is more likely for the change rate (inserts,
I am less certain about updates and deletes) to be independent of the
table size. (Too) many large databases are so large not because the
data influx grows linearly over time, but because people don't want to
get rid of old data (or would very much like to do so, but never planned
for it).

This second scenario would be much better served by a high threshold and
a low scale factor.

> After thinking about it for a while, I think it's fundamentally flawed
> to use large constant thresholds to avoid unnecessary vacuums. It's easy
> to see cases where it's bad for common databases of today, but it'll be
> much worse a few years down the line where common table sizes have grown
> by a magnitude or two. Nor do they address the difference between tables
> of a certain size with e.g. 2kb wide rows, and a same sized table with
> 28 byte wide rows. The point of constant thresholds imo can only be to
> avoid unnecessary work at the *small* (even tiny) end, not the opposite.
>
>
> I think there's too much "reinventing" autovacuum scheduling in a
> "local" insert-only manner happening in this thread. And as far as I can
> tell additionally only looking at a somewhat narrow slice of insert only
> workloads.

Perhaps. The traditional "high scale factor, low threshold" system
is (in my perception) mostly based on the objective of cleaning up
dead tuples. When autovacuum was introduced, index only scans were
only a dream.

With the objective of getting rid of dead tuples, having the scale factor
be the dominant part makes sense: it is OK for bloat to be a certain
percentage of the table size.

Also, as you say, tables were much smaller then, and they will only
become bigger in the future. But I find that to be an argument *for*
making the threshold the dominant element: otherwise, you vacuum less
and less often, and the individual runs become larger and larger.
Now that vacuum skips pages where it knows it has nothing to do,
doesn't take away much of the pain of vacuuming large tables where
nothing much has changed?

> I, again, strongly suggest using much more conservative values here. And
> then try to address the shortcomings - like not freezing aggressively
> enough - in separate patches (and by now separate releases, in all
> likelihood).

There is much to say for that, I agree.

> This will have a huge impact on a lot of postgres
> installations. Autovacuum already is perceived as one of the biggest
> issues around postgres. If the ratio of cases where these changes
> improve things to the cases it regresses isn't huge, it'll be painful
> (silent improvements are obviously less noticed than breakages).

Yes, that makes it scary to mess with autovacuum.

One of the problems I see in the course of this discussion is that one
can always come up with examples that make any choice look bad.
It is impossible to do it right for everybody.

In the light of that, I won't object to a more conservative default
value for the parameters, even though my considerations above suggest
to me the opposite. But perhaps my conclusions are based on flawed
premises.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-20 06:00:45 Re: Improving connection scalability: GetSnapshotData()
Previous Message Amit Kapila 2020-03-20 05:54:25 Re: error context for vacuum to include block number