Re: do only critical work during single-user vacuum?

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: do only critical work during single-user vacuum?
Date: 2022-01-19 05:46:01
Message-ID: CAD21AoCj-uRMkWFaZkDpKaDWcJ4BCuoYVWPRCuGDPcNNwYVy_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 14, 2022 at 7:04 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> On 1/13/22, 4:58 AM, "John Naylor" <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > On Wed, Jan 12, 2022 at 12:26 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> >> As I've stated upthread, Sawada-san's suggested approach was my
> >> initial reaction to this thread. I'm not wedded to the idea of adding
> >> new options, but I think there are a couple of advantages. For both
> >> single-user mode and normal operation (which may be in imminent
> >> wraparound danger), you could use the same command:
> >>
> >> VACUUM (MIN_XID_AGE 1600000000, ...);
> >
> > My proposed top-level statement can also be used in normal operation,
> > so the only possible advantage is configurability. But I don't really
> > see any advantage in that -- I don't think we should be moving in the
> > direction of adding more-intricate ways to paper over the deficiencies
> > in autovacuum scheduling. (It could be argued that I'm doing exactly
> > that in this whole thread, but [imminent] shutdown situations have
> > other causes besides deficient scheduling.)
>
> The new top-level command would be configurable, right? Your patch
> uses autovacuum_freeze_max_age/autovacuum_multixact_freeze_max_age, so
> the behavior of this new command now depends on the values of
> parameters that won't obviously be related to it. If these parameters
> are set very low (e.g., the default values), then this command will
> end up doing far more work than is probably necessary.
>
> If we did go the route of using a parameter to determine which tables
> to vacuum, I think vacuum_failsafe_age is a much better candidate, as
> it defaults to a much higher value that is more likely to prevent
> doing extra work. That being said, I don't know if overloading
> parameters is the right way to go.
>
> >> (As an aside, we'd need to figure out how XID and MXID options would
> >> work together. Presumably most users would want to OR them.)
> >>
> >> This doesn't really tie in super nicely with the failsafe mechanism,
> >> but adding something like a FAILSAFE option doesn't seem right to me,
> >
> > I agree -- it would be awkward and messy as an option. However, I see
> > the same problem with xid/mxid -- I would actually argue they are not
> > even proper options; they are "selectors". Your comments above about
> > 1) needing to OR them and 2) emitting a message when a VACUUM command
> > doesn't actually do anything are evidence of that fact.
>
> That's a fair point. But I don't think these problems are totally
> intractable. We already emit "skipping" messages from VACUUM
> sometimes, and interactions between VACUUM options exist today, too.
> For example, FREEZE is redundant when FULL is specified, and
> INDEX_CLEANUP is totally ignored when FULL is used.
>
> >> The other advantage I see with age-related options is that it can be
> >> useful for non-imminent-wraparound situations as well. For example,
> >> maybe a user just wants to manually vacuum everything (including
> >> indexes) with an age above 500M on the weekends.

I also think there is a use case where a user just wants to manually
vacuum tables that are older than a certain threshold. In this case,
they might want to specify VACUUM command options such as the parallel
option while selecting tables.

> >
> > There is already vaccumdb for that, and I think it's method of
> > selecting tables is sound -- I'm not convinced that pushing table
> > selection to the server command as "options" is an improvement.

I think that having the user not rely on vacuumdb by implementing it
on the server side would be an improvement.

> I guess I'm ultimately imagining the new options as replacing the
> vacuumdb implementation. IOW vacuumdb would just use MIN_(M)XID_AGE
> behind the scenes (as would a new top-level command).

I had the same idea.

That having been said, I agree that xid/mxid options are different
things from the existing VACUUM command options; whereas the existing
VACUUM options control its behavior, xid/mxid options are selectors
for tables to vacuum (PROCESS_TOAST option could be a selector but I
think it’s slightly different from xid/mxid options).

IIUC what we want to do here are two things: (1) select only old
tables and (2) set INDEX_CLEANUP = off, TRUNCATE = off, and FREEZE =
on. VACUUM LIMIT statement does both things at the same time. Although
I’m concerned a bit about its flexibility, it’s a reasonable solution.

On the other hand, it’s probably also useful to do either one thing in
some cases. For instance, having a selector for (1) would be useful,
and having a new option like FAST_FREEZE for (2) would also be useful.
Given there is already a way for (2) (it does not default though), I
think it might also be a good start inventing something for (1). For
instance, a selector for VACUUM statement I came up with is:

VACUUM (verbose on) TABLES WITH (min_xid_age = 1600000000);
or
VACUUM (verbose on) TABLES WITH (min_age = failsafe_limit);

We can expand it in the future to select tables by, for example, dead
tuple ratio, size, etc.

It's a random thought but maybe worth considering.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-01-19 06:32:19 Re: Skipping logical replication transactions on subscriber side
Previous Message Pavel Stehule 2022-01-19 05:07:37 Re: Schema variables - new implementation for Postgres 15