Re: autovacuum prioritization

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: autovacuum prioritization
Date: 2022-01-26 23:46:00
Message-ID: CAM-w4HOq4uvzQxXt2d_PvF+2F=YR6Pr1gpxuQjLaN85xktk+qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 20 Jan 2022 at 14:31, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> In my view, previous efforts in this area have been too simplistic.
>

One thing I've been wanting to do something about is I think
autovacuum needs to be a little cleverer about when *not* to vacuum a
table because it won't do any good.

I've seen a lot of cases where autovacuum kicks off a vacuum of a
table even though the globalxmin hasn't really advanced significantly
over the oldest frozen xid. When it's a large table this really hurts
because it could be hours or days before it finishes and at that point
there's quite a bit of bloat.

This isn't a common occurrence, it happens when the system is broken
in some way. Either there's an idle-in-transaction session or
something else keeping the global xmin held back.

What it does though is make things *much* worse and *much* harder for
a non-expert to hit on the right remediation. It's easy enough to tell
them to look for these idle-in-transaction sessions or set timeouts.
It's much harder to determine whether it's a good idea for them to go
and kill the vacuum that's been running for days. And it's not a great
thing for people to be getting in the habit of doing either.

I want to be able to stop telling people to kill vacuums kicked off by
autovacuum. I feel like it's a bad thing for someone to ever have to
do and I know some fraction of the time I'm telling them to do it
it'll have been a terrible thing to have done (but we'll never know
which times those were). Determining whether a running vacuum is
actually doing any good is pretty hard and on older versions probably
impossible.

I was thinking of just putting a check in before kicking off a vacuum
and if the globalxmin is a significant fraction of the distance to the
relfrozenxid then instead log a warning. Basically it means "we can't
keep the bloat below the threshold due to the idle transactions et al,
not because there's insufficient i/o bandwidth".

At the same time it would be nice if autovacuum could recognize when
the i/o bandwidth is insufficient. If it finishes a vacuum it could
recheck whether the table is eligible for vacuuming and log that it's
unable to keep up with the vacuuming requirements -- but right now
that would be a lie much of the time when it's not a lack of bandwidth
preventing it from keeping up.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-01-26 23:54:27 Re: autovacuum prioritization
Previous Message Andrew Dunstan 2022-01-26 23:45:53 Re: snapper and skink and fairywren (oh my!)