Re: do only critical work during single-user vacuum?

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: do only critical work during single-user vacuum?
Date: 2022-02-03 08:14:07
Message-ID: CAD21AoD06hKYbx_vYj8WGA=ccPHKZ30R8dgX1aFXHbndyyu+pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 2, 2022 at 6:50 AM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Thu, Jan 27, 2022 at 8:28 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > I'm sure you meant "&" here (fixed in attached patch to appease the cfbot):
> > + if (options | VACOPT_MINIMAL)
>
> Thanks for catching that! That copy-pasto was also masking my failure
> to process the option properly -- fixed in the attached as v5.
>
> > It should either refuse to run if a list of tables is specified with MINIMAL,
> > or it should filter that list by XID condition.
>
> I went with the former for simplicity. As a single-purpose option, it
> makes sense.
>
> > As for the name, it could be MINIMAL or FAILSAFE or EMERGENCY or ??
> > I think the name should actually be a bit more descriptive, and maybe say XID,
> > like MINIMAL_XID or XID_EMERGENCY...
>
> I went with EMERGENCY in this version to reinforce its purpose in the
> mind of the user (and reader of this code).
>
> > Normally, options are independent, but VACUUM (MINIMAL) is a "shortcut" to a
> > hardcoded set of options: freeze on, truncate off, cleanup off. So it refuses
> > to be combined with other options - good.
> >
> > This is effectively a shortcut to hypothetical parameters for selecting tables
> > by XID/MXID age. In the future, someone could debate adding user-facing knobs
> > for table selection by age.
>
> I used the params struct in v5 for the emergency cutoff ages. Even
> with the values hard-coded, it seems cleaner to keep them here.
>
> > I still wonder if the relations should be processed in order of decreasing age.
> > An admin might have increased autovacuum_freeze_max_age up to 2e9, and your
> > query might return thousands of tables, with a wide range of sizes and ages.
> >
> > Processing them in order of decreasing age would allow the admin to quickly
> > vacuum the oldest tables, and optionally interrupt vacuum to get out of single
> > user mode ASAP - even if their just want to run VACUUM(MINIMAL) in a normal
> > backend when services aren't offline. Processing them out of order might be
> > pretty surprising - they might run vacuum for an hour (or overnight), cancel
> > it, attempt to start the DB in normal mode, and conclude that it made no
> > visible progress.
>
> While that seems like a nice property to have, it does complicate
> things, so can be left for follow-on work.
>
> Also in v5:
>
> - It mentions the new command in the error hint in
> GetNewTransactionId(). I'm not sure if multi-word commands should be
> quoted like this.
> - A first draft of documentation

Thank you for updating the patch.

I have a few questions and comments:

+ The only other option that may be combined with
<literal>VERBOSE</literal>, although in single-user mode no client
messages are
+ output.

Given VERBOSE with EMERGENCY can work only in multi-user mode, why
only VERBOSE can be specified with EMERGENCY? I think the same is true
for other options like PARALLEL; PARALLEL can work only in multi-user
mode.

---
+ It performs a database-wide vacuum on tables, toast tables, and
materialized views whose
+ xid age or mxid age is older than 1 billion.

Do we need to allow the user to specify the threshold or need a higher
value (at least larger than 1.6 billion, default value of
vacuum_failsafe_age)? I imagined a case where there are a few very-old
tables (say 2 billion old) and many tables that are older than 1
billion. In this case, VACUUM (EMERGENCY) would take a long time to
complete. But to minimize the downtime, we might want to run VACUUM
(EMERGENCY) on only the very-old tables, start the cluster in
multi-user mode, and run vacuum on multiple tables in parallel.

---
+ if (params->options & VACOPT_EMERGENCY)
+ {
+ /*
+ * Only consider relations able to hold unfrozen XIDs (anything else
+ * should have InvalidTransactionId in relfrozenxid anyway).
+ */
+ if (classForm->relkind != RELKIND_RELATION &&
+ classForm->relkind != RELKIND_MATVIEW &&
+ classForm->relkind != RELKIND_TOASTVALUE)
+ {
+ Assert(!TransactionIdIsValid(classForm->relfrozenxid));
+ Assert(!MultiXactIdIsValid(classForm->relminmxid));
+ continue;
+ }
+
+ table_xid_age = DirectFunctionCall1(xid_age,
classForm->relfrozenxid);
+ table_mxid_age = DirectFunctionCall1(mxid_age,
classForm->relminmxid);
+

I think that instead of calling xid_age and mxid_age for each
relation, we can compute the thresholds for xid and mxid once, and
then compare them to relation's relfrozenxid and relminmxid.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-02-03 08:29:54 Re: row filtering for logical replication
Previous Message Robert Treat 2022-02-03 08:09:07 Re: RFC: Logging plan of the running query