RE: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: "'Bossart, Nathan'" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM
Date: 2020-07-14 05:34:01
Message-ID: OSBPR01MB2341077BB2B7FDDEA5CD7D0AEF610@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, July 14, 2020 3:01 AM (GMT+9), Bossart, Nathan wrote:

Hi Nathan,

>On 7/13/20, 11:02 AM, "Justin Pryzby" <pryzby(at)telsasoft(dot)com> wrote:
>> Should bin/vacuumdb support this?
>
>Yes, it should. I've added it in v5 of the patch.

Thank you for the updated patch. I've joined as a reviewer.
I've also noticed that you have incorporated Justin's suggested vacuumdb support
in the recent patch, but in my opinion it'd be better to split them for better readability.
According to the cfbot, patch applies cleanly and passes all the tests.

[Use Case]
>The main use case I'm targeting is when the level of bloat or
>transaction ages of a relation and its TOAST table have significantly
>diverged. In these scenarios, it could be beneficial to be able to
>vacuum just one or the other, especially if the tables are large.
>...
>I reused the existing VACOPT_SKIPTOAST option to implement
>SECONDARY_RELATION_CLEANUP. This option is currently only used for
>autovacuum.

Perhaps this has not gathered much attention yet because it's not experienced
by many, but I don't see any problem with the additional options on manual
VACUUM on top of existing autovacuum cleanups. And I think this is useful
for the special use case mentioned, especially that toast table access is not
in public as per role limitation.

[Docs]
I also agree with "TOAST_TABLE_CLEANUP" and just name the options after the
respective proposed relation types in the future.

+ <term><literal>MAIN_RELATION_CLEANUP</literal></term>
+ <listitem>
+ <para>
+ Specifies that <command>VACUUM</command> should attempt to process the
+ main relation. This is normally the desired behavior and is the default.
+ Setting this option to false may be useful when it is necessary to only
+ vacuum a relation's corresponding <literal>TOAST</literal> table.

Perhaps it's just my own opinion, but I think the word "process" is vague for
a beginner in postgres reading the documents. OTOH, I know it's also used
in the source code, so I guess it's just the convention. And "process" is
intuititve as "processing tables". Anyway, just my 2 cents & isn't a strong
opinion.

Also, there's an extra space between the 1st and 2nd sentences.

+ <term><literal>TOAST_TABLE_CLEANUP</literal></term>
+ <listitem>
+ <para>
+ Specifies that <command>VACUUM</command> should attempt to process the
+ corresponding <literal>TOAST</literal> table for each relation, if one
+ exists. This is normally the desired behavior and is the default.
+ Setting this option to false may be useful when it is necessary to only
+ vacuum the main relation. This option cannot be disabled when the
+ <literal>FULL</literal> option is used.

Same comments as above, & extra spaces in between the sentences.

@@ -1841,9 +1865,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
/*
* Remember the relation's TOAST relation for later, if the caller asked
* us to process it. In VACUUM FULL, though, the toast table is
- * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
+ * automatically rebuilt by cluster_rel, so we shouldn't recurse to it
+ * unless MAIN_RELATION_CLEANUP is disabled.

The additional last line is a bit confusing (and may be unnecessary/unrelated).
To clarify this thread on VACUUM FULL and my understanding of revised vacuum_rel below,
we allow MAIN_RELATION_CLEANUP option to be disabled (skip processing main relation)
and TOAST_TABLE_CLEANUP should be disabled because cluster_rel() will process the
toast table anyway.
Is my understanding correct? If yes, then maybe "unless" should be "even if" instead,
or we can just remove the line.

static bool
-vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
+vacuum_rel(Oid relid,
+ RangeVar *relation,
+ VacuumParams *params,
+ bool processing_toast_table)
{
...
+ bool process_toast;
...

- if (!(params->options & VACOPT_SKIPTOAST) && !(params->options & VACOPT_FULL))
+ process_toast = (params->options & VACOPT_TOAST_CLEANUP) != 0;
+
+ if ((params->options & VACOPT_FULL) != 0 &&
+ (params->options & VACOPT_MAIN_REL_CLEANUP) != 0)
+ process_toast = false;
+
+ if (process_toast)
toast_relid = onerel->rd_rel->reltoastrelid;
else
toast_relid = InvalidOid;
...

* Do the actual work --- either FULL or "lazy" vacuum
+ *
+ * We skip this part if we're processing the main relation and
+ * MAIN_RELATION_CLEANUP has been disabled.
*/
- if (params->options & VACOPT_FULL)
+ if ((params->options & VACOPT_MAIN_REL_CLEANUP) != 0 ||
+ processing_toast_table)
...
if (toast_relid != InvalidOid)
- vacuum_rel(toast_relid, NULL, params);
+ vacuum_rel(toast_relid, NULL, params, true);

>I've attached v3 of the patch, which removes the restriction on
>ANALYZE with MAIN_RELATION_CLEANUP disabled.

I've also confirmed those through regression + tap test in my own env
and they've passed. I'll look into deeply again if I find problems.

I think this follows the similar course of previously added VACUUM and
vacuummdb options (for using and skipping truncate, index cleanup, etc.),
so the patch seems almost plausible enough for me.

Regards,
Kirk

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-07-14 05:43:41 Re: Improvements in Copy From
Previous Message Michael Paquier 2020-07-14 05:31:04 Re: TAP tests and symlinks on Windows