Re: autovacuum prioritization

From: Robert Treat <rob(at)xzilla(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: autovacuum prioritization
Date: 2022-02-01 17:35:41
Message-ID: CABV9wwNX5TarutBy5_d+3m2Q747Q=-P-hjTvCK-6c30NROj2fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 26, 2022 at 6:56 PM Greg Stark <stark(at)mit(dot)edu> wrote:
>
> On Wed, 26 Jan 2022 at 18:46, Greg Stark <stark(at)mit(dot)edu> wrote:
> >
> > On Thu, 20 Jan 2022 at 14:31, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > >
> > > In my view, previous efforts in this area have been too simplistic.
> > >
> >
> > One thing I've been wanting to do something about is I think
> > autovacuum needs to be a little cleverer about when *not* to vacuum a
> > table because it won't do any good.
> >
> > I've seen a lot of cases where autovacuum kicks off a vacuum of a
> > table even though the globalxmin hasn't really advanced significantly
> > over the oldest frozen xid. When it's a large table this really hurts
> > because it could be hours or days before it finishes and at that point
> > there's quite a bit of bloat.
>
>
> Another case I would like to see autovacuum get clever about is when
> there is a wide disparity in the size of tables. If you have a few
> large tables and a few small tables there could be enough bandwidth
> for everyone but you can get in trouble if the workers are all tied up
> vacuuming the large tables.
>
> This is a case where autovacuum scheduling can create a problem where
> there shouldn't be one. It often happens when you have a set of large
> tables that were all loaded with data around the same time and you
> have your busy tables that are well designed small tables receiving
> lots of updates. They can happily be getting vacuumed every 15-30min
> and finishing promptly maintaining a nice steady state until one day
> all the large tables suddenly hit the freeze threshold and suddenly
> all your workers are busy vacuuming huge tables that take hours or
> days to vacuum and your small tables bloat by orders of magnitude.
>
> I was thinking of dividing the eligible tables up into ntiles based on
> size and then making sure one worker was responsible for each ntile.
> I'm not sure that would actually be quite right though.
>

I've been working off and on some external vacuum scheduling tools the
past yearish and one thing that seems to be an issue is a lack of
observability into the various cost delay/limit mechanisms, like how
much does a vacuum contribute towards the limit or how much was it
delayed during a given run. One theory was if we are seeing a lot of
slow down due to cost limiting, we should more heavily weight smaller
tables in our priority list for which tables to vacuum vs larger
tables which we expect to exacerbate the situation.

I've also thought it'd be nice for users to have an easy way to
guesstimate % of frozen tables (like live vs dead tuples in
pg_stat_all_tables), but this seems difficult to maintain accurately.
Had a similar thing with tracking clock time of vacuums; just keeping
the duration of the last vacuum ended up being insufficient for some
cases, so we ended up tracking it historically... we haven't quite yet
designed a pg_stat_vacuums a la pg_stat_statements, but it has crossed
our minds.

Robert Treat
https://xzilla.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-01 17:38:30 Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Previous Message Justin Pryzby 2022-02-01 17:13:29 Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?