Re: Berserk Autovacuum (let's save next Mandrill)

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-05 10:16:22
Message-ID: CAC8Q8tK4095OuMefrwh4242x=i5LmqS7i-WoB+rnArP6O6Q-hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks Laurenz for taking action on this and writing a better patch
than my initial.
This will help avoid both Mandrill-like downtimes and get Index Only
Scan just work on large telemetry databases like the one I was
responsible for back when I was in Juno.

On Thu, Mar 5, 2020 at 9:40 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Thu, 5 Mar 2020 at 04:15, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > I just realized that the exercise is pointless unless that
> > autovacuum also runs with FREEZE on.

> 8. Should we care when setting the insert counter back to 0 if
> auto-vacuum has skipped pages?

I believe it would be enough just to leave a comment about this in code.

> 10. I'm slightly worried about the case where we don't quite trigger a
> normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
> up the indexes but proceed to leave dead index entries causing indexes
> to become bloated. It does not seem impossible that given the right
> balance of INSERTs and UPDATE/DELETEs that this could happen every
> time and the indexes would just become larger and larger.

Can we not reset statistics about dead tuples upon index-skipping
vacuum, since we didn't really take care of them?

> 11. We probably do also need to debate if we want this on or off by
> default. I'd have leaned towards enabling by default if I'd not
> personally witnessed the fact that people rarely* increase auto-vacuum
> to run faster than the standard cost settings. I've seen hundreds of
> servers over the years with all workers busy for days on something
> they'll never finish quickly enough. We increased those settings 10x
> in PG12, so there will be fewer people around suffering from that now,
> but even after having reduced the vacuum_cost_delay x10 over the PG11
> settings, it's by no means fast enough for everyone. I've mixed
> feelings about giving auto-vacuum more work to do for those people, so
> perhaps the best option is to keep this off by default so as not to
> affect the people who don't tune auto-vacuum. They'll just suffer the
> pain all at once when they hit max freeze age instead of more
> gradually with the additional load on the workers. At least adding
> this feature gives the people who do tune auto-vacuum some ability to
> handle read-only tables in some sane way.

That's exactly the situation we're trying to avoid with this patch.
Suffering all at once takes large production deployments down for
weeks, and that gets into news.
In current cloud setups it's plain impossible to read the whole
database at all, let alone rewrite, with IO budgets.
I say we should enable this setting by default.
If my calculations are correct, that will freeze the table once each
new gigabyte of data is written, which is usually fitting into burst
read thresholds.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-05 10:51:11 Re: Crash by targetted recovery
Previous Message tushar 2020-03-05 10:10:46 Re: backup manifests