Re: Berserk Autovacuum (let's save next Mandrill)

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-13 21:48:27
Message-ID: da71bf4bc29d7919d4900191dfd1186396e3b03a.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote:
> Possible it would be better to run VACUUM *without* freeze_min_age=0 ? (I get
> confused and have to spend 20min re-reading the vacuum GUC docs every time I
> deal with this stuff, so maybe I'm off).
>
> As I understand, the initial motivation of this patch was to avoid disruptive
> anti-wraparound vacuums on insert-only table. But if vacuum were triggered at
> all, it would freeze the oldest tuples, which is all that's needed; especially
> since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need
> to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok,
> they're handled next time.

Freezing tuples too early is wasteful if the tuples get updated or deleted
soon after, but based on the assumption that an autovacuum triggered by insert
is dealing with an insert-mostly table, it is not that wasteful.

If we didn't freeze all tuples, it is easy to envision a situation where
bulk data loads load several million rows in a few transactions, which
would trigger a vacuum. With the normal vacuum_freeze_min_age, that vacuum
would do nothing at all. It is better if each vacuum freezes some rows,
in other words, if it does some of the anti-wraparound work.

> Another motivation of the patch is to allow indexonly scan, for which the
> planner looks at pages' "relallvisible" fraction (and at execution if a page
> isn't allvisible, visits the heap). Again, that happens if vacuum were run at
> all. Again, some pages won't be marked allvisible, which is fine, they're
> handled next time.

Yes, freezing is irrelevant with respect to index only scans, but it helps
with mitigating the impact of anti-wraparound vacuum runs.

> I think freeze_min_age=0 could negatively affect people who have insert-mostly
> tables (I'm not concerned, but that includes us). If they consistently hit the
> autovacuum insert threshold before the cleanup threshold for updated/deleted
> tuples, any updated/deleted tuples would be frozen, which would be
> wasteful:

I don't get that. Surely tuples whose xmax is committed won't be frozen.

> So my question is if autovacuum triggered by insert threshold should trigger
> VACUUM with the same settings as a vacuum due to deleted tuples. I realize the
> DBA could just configure the thresholds so they'd hit vacuum for cleaning dead
> tuples, so my suggestion maybe just improves the case with the default
> settings. It's possible to set the reloption autovacuum_freeze_min_age, which
> I think supports the idea of running a vacuum normally and letting it (and the
> DBA) decide what do with existing logic.

Yes, the DBA can explicitly set vacuum_freeze_min_age to 0.

But for one DBA who understands his or her workload well enough, and who knows
the workings of autovacuum well enough to do that kind of tuning, there are
99 DBAs who don't, and it is the goal of the patch (expressed in the subject)
to make things work for those people who go with the default.

And I believe that is better achieved with freezing as many tuples as possible.

> Also, there was a discussion about index cleanup with the conclusion that it
> was safer not to skip it, since otherwise indexes might bloat. I think that's
> right, since vacuum for cleanup is triggered by the number of dead heap tuples.
> To skip index cleanup, I think you'd want a metric for
> n_dead_since_index_cleanup. (Or maybe analyze could track dead index tuples
> and trigger vacuum of each index separately).

Yes, I think we pretty much all agree on that.

> Having now played with the patch, I'll suggest that 10000000 is too high a
> threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be
> much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.

There is the concern that that might treat large table to seldom.

I am curious - what were the findings that led you to think that 10000000
is too high?

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-13 22:42:32 Re: proposal: new polymorphic types - commontype and commontypearray
Previous Message Alvaro Herrera 2020-03-13 21:39:06 Re: range_agg