Re: [Proposal] Allow users to specify multiple tables in VACUUM commands

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Date: 2017-08-29 16:47:30
Message-ID: 2EB66FDC-441C-4BE4-8CC2-84C0BEAB17A0@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/28/17, 11:26 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
> Here is some input for vacuum_multiple_tables_v9, about which I think
> that we are getting to something committable. Here are some minor
> comments.

Thanks for another review.

> <para>
> - With no parameter, <command>VACUUM</command> processes every table in the
> + With no parameters, <command>VACUUM</command> processes every table in the
> current database that the current user has permission to vacuum.
> - With a parameter, <command>VACUUM</command> processes only that table.
> + With parameters, <command>VACUUM</command> processes only the tables
> + specified.
> </para>
> The part about parameters looks fine to me if unchanged.

This is reverted in v10.

> + foreach(lc, relations)
> + {
> + VacuumRelation *relation = lfirst_node(VacuumRelation, lc);
> + if (relation->va_cols != NIL && (options & VACOPT_ANALYZE) == 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("ANALYZE option must be specified when a
> column list is provided")));
> + }
> Could you add a hint with the relation name involved here? When many
> relations are defined in the VACUUM query this would be useful for the
> user.

Added in v10.

> + relinfo = makeNode(VacuumRelation);
> + rel_oid = HeapTupleGetOid(tuple);
> + relinfo->oid = rel_oid;
> There are 4 patterns like that in the patch. We could make use of a
> makeVacuumRelation.

Agreed, I've added one.

> About the de-duplication patch, I have to admit that I am still not a
> fan of doing such a thing. Another road that we could take is to
> simply complain with a proper error message if:
> - the same column name is specified twice for a relation.
> - the same relation is defined twice. In the case of partitions, we
> could track the fact that it is already listed as part of a parent,
> though perhaps it does not seem worth the extra CPU cost especially
> when there are multiple nesting levels with partitions.
> Autovacuum has also the advantage, if I recall correctly, to select
> all columns for analyze, and skip parent partitions when scanning for
> relations so that's a safe bet from this side. Opinions welcome.

I lean towards favoring the de-duplication patch, but maybe I am biased
as the author. I can see the following advantages:

1. Ease of use. By taking care of de-duplicating on behalf of the user,
they needn't worry about inheritance structures or accidentally
specifying the same relation or column twice. This might be especially
useful if a large number of relations or columns must be specified.
2. Resource conservation. By de-duplicating, VACUUM and ANALYZE are
doing roughly the same thing but with less work.
3. The obnoxious errors you were experiencing are resolved. This seems
like the strongest argument to me, as it fixes an existing issue.

Disadvantages might include:

1. Users cannot schedule repeated VACUUMs on the same relation (e.g.
'VACUUM table, table, table;'). However, I cannot think of a time when
I needed this, and it seems like something else is wrong with VACUUM if
folks are resorting to this. In the end, you could still achieve this
via several VACUUM statements.
2. Any inferred ordering for how the relations are processed will not
be accurate if there are duplicates. Ultimately, users might lose some
amount of control here, but I am not sure how prevalent this use case
might be. In the worst case, you could achieve this via several
individual VACUUM statements as well.

Your suggestion to ERROR seems like a reasonable compromise, but I
could see it causing frustration in some cases, especially with
partitioning.

Nathan

Attachment Content-Type Size
dedupe_vacuum_relations_v2.patch application/octet-stream 4.7 KB
vacuum_multiple_tables_v10.patch application/octet-stream 30.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-29 18:18:15 Re: Hash Functions
Previous Message Tom Lane 2017-08-29 16:35:43 Re: [HACKERS] [postgresql 10 beta3] unrecognized node type: 90