Re: [Proposal] vacuumdb --schema only

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Gilles Darold <gilles(at)migops(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] vacuumdb --schema only
Date: 2022-03-30 21:22:58
Message-ID: 20220330212258.GA2254@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I took a look at the v4 patch.

'git-apply' complains about whitespace errors:

0001-vacuumdb-schema-only-v4.patch:17: tab in indent.
<arg choice="plain">
0001-vacuumdb-schema-only-v4.patch:18: tab in indent.
<arg choice="opt">
0001-vacuumdb-schema-only-v4.patch:19: tab in indent.
<group choice="plain">
0001-vacuumdb-schema-only-v4.patch:20: tab in indent.
<arg choice="plain"><option>-n</option></arg>
0001-vacuumdb-schema-only-v4.patch:21: tab in indent.
<arg choice="plain"><option>--schema</option></arg>
warning: squelched 13 whitespace errors
warning: 18 lines add whitespace errors.

+ printf(_(" -N, --exclude-schema=PATTERN do NOT vacuum tables in the specified schema(s)\n"));

I'm personally -1 for the --exclude-schema option. I don't see any
existing "exclude" options in vacuumdb, and the uses for such an option
seem rather limited. If we can point to specific use-cases for this
option, I might be willing to change my vote.

+ <para>
+ To clean all tables in the <literal>Foo</literal> and <literal>bar</literal> schemas
+ only in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>vacuumdb --schema='"Foo"' --schema='bar' xyzzy</userinput>
+</screen></para>

nitpicks: I think the phrasing should be "To only clean tables in the...".
Also, is there any reason to use a schema name with a capital letter as an
example? IMO that just adds unnecessary complexity to the example.

+$node->issues_sql_like(
+ [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+ qr/VACUUM "Foo".*/,
+ 'vacuumdb --schema schema only');

IIUC there should only be one table in the schema. Can we avoid matching
"*" and check for the exact command instead?

I think there should be a few more test cases. For example, we should test
using -n and -N at the same time, and we should test what happens when
those options are used for missing schemas.

+ /*
+ * When filtering on schema name, filter by table is not allowed.
+ * The schema name can already be set to a fqdn table name.
+ */
+ if (tbl_count && (schemas.head != NULL))
+ {
+ pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+ exit(1);
+ }

I think there might be some useful refactoring we can do that would
simplify adding similar options in the future. Specifically, can we have a
global variable that stores the type of vacuumdb command (e.g., all,
tables, or schemas)? If so, perhaps the tables list could be renamed and
reused for schemas (and any other objects that need listing in the future).

+ if (schemas != NULL && schemas->head != NULL)
+ {
+ appendPQExpBufferStr(&catalog_query,
+ " AND c.relnamespace");
+ if (schema_is_exclude == TRI_YES)
+ appendPQExpBufferStr(&catalog_query,
+ " OPERATOR(pg_catalog.!=) ALL (ARRAY[");
+ else if (schema_is_exclude == TRI_NO)
+ appendPQExpBufferStr(&catalog_query,
+ " OPERATOR(pg_catalog.=) ANY (ARRAY[");
+
+ for (cell = schemas->head; cell != NULL; cell = cell->next)
+ {
+ appendStringLiteralConn(&catalog_query, cell->val, conn);
+
+ if (cell->next != NULL)
+ appendPQExpBufferStr(&catalog_query, ", ");
+ }
+
+ /* Finish formatting schema filter */
+ appendPQExpBufferStr(&catalog_query, "]::pg_catalog.regnamespace[])\n");
+ }

IMO we should use a CTE for specified schemas like we do for the specified
tables. I wonder if we could even have a mostly-shared CTE code path for
all vacuumdb commands with a list of names.

- /*
- * If no tables were listed, filter for the relevant relation types. If
- * tables were given via --table, don't bother filtering by relation type.
- * Instead, let the server decide whether a given relation can be
- * processed in which case the user will know about it.
- */
- if (!tables_listed)
+ else
{
+ /*
+ * If no tables were listed, filter for the relevant relation types. If
+ * tables were given via --table, don't bother filtering by relation type.
+ * Instead, let the server decide whether a given relation can be
+ * processed in which case the user will know about it.
+ */

nitpick: This change seems unnecessary.

I noticed upthread that there was some discussion around adding a way to
specify a schema in VACUUM and ANALYZE commands. I think this patch is
useful even if such an option is eventually added, as we'll still want
vacuumdb to obtain the full list of tables to process so that it can
effectively parallelize.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-30 21:41:36 Re: Correct docs re: rewriting indexes when table rewrite is skipped
Previous Message Andres Freund 2022-03-30 21:08:41 Re: Higher level questions around shared memory stats