Re: [Proposal] Allow users to specify multiple tables in VACUUM commands

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Date: 2017-08-24 03:38:10
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On 8/18/17, 12:56 AM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
> Another pass for this patch.

Thanks! I've attached v8 of the patch.

On 8/18/17, 1:49 PM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 18, 2017 at 1:56 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> + /*
>> + * We have already checked the column list in vacuum(...),
>> + * but the columns may have disappeared since then. If
>> + * this happens, emit a nice warning message and skip the
>> + * undefined column.
>> + */
>> I think that this would be reworded. "nice" is cute is this context.
>> Why not just saying something like:
>> "Do not issue an ERROR if a column is missing but use a WARNING
>> instead. At the beginning of the VACUUM run, the code already checked
>> for undefined columns and informed about an ERROR, but we want the
>> processing to move on for existing columns."
> Hmm, I find your (Michael's) suggestion substantially less clear than
> the wording to which you are objecting.

I'll leave this as-is for now.

> And actually, your patch does not seem to work, and makes things worse:
> =# analyze aa (a, a);
> ERROR: XX000: tuple already updated by self
> LOCATION: simple_heap_update, heapam.c:4482

I think the underlying issue is that dedupe_relations(...) does not
handle duplicate columns correctly. The attached patch should fix that
issue. I've also added some test cases to cover this.

> +/*
> + * This is used to keep track of a relation and an optional list of
> + * column names, as may be specified in VACUUM and ANALYZE.
> + */
> +typedef struct RelationAndColumns
> +{
> + NodeTag type;
> + RangeVar *relation; /* single table to process */
> + List *va_cols; /* list of column names, or NIL for all */
> + List *oids; /* corresponding OIDs (filled in by
> [auto]vacuum.c) */
> +} RelationAndColumns;
> This name is a bit awkward. Say VacuumRelation? I also don't
> understand why there are multiple OIDs here. There should be only one,
> referring to the relation of this RangeVar. Even for inherited
> relations what should be done is to add one entry RelationAndColumns
> (or VacuumRelation) for each child relation.

I'll admit I struggled with naming this struct, so I'm happy to change it.

According to the docs, VACUUM and ANALYZE "do not support recursing over
inheritance hierarchies" [1]. However, we need a list of OIDs for
partitioned tables. Namely, this piece of code in get_rel_oids(...):

if (include_parts)
oid_list = list_concat(oid_list,
find_all_inheritors(relid, NoLock, NULL));
oid_list = lappend_oid(oid_list, relid);

Since all of these OIDs should correspond to the same partitioned table,
it makes sense to me to leave them together instead of breaking out each
partition into a VacuumRelation. If we did so, I think we would also need
to duplicate the va_cols list for each partition. What do you think?

> + /*
> + * Check that all specified columns exist so that we can fast-fail
> + * commands with multiple tables. If the column disappears before we
> + * actually process it, we will emit a WARNING and skip it later on.
> + */
> + foreach(relation, relations)
> + check_columns_exist(lfirst(relation));
> The full list could be processed in there.

Sure. I've made this change.

> +static void
> +check_columns_exist(RelationAndColumns *relation)
> [...]
> + Relation rel;
> + ListCell *lc;
> +
> + rel = try_relation_open(lfirst_oid(oid), NoLock);
> + if (!rel)
> This really meritates a comment. In short: why is it fine to not take
> a lock here? The answer I think is that even if the relation does not
> exist vacuum would do nothing, but this should be written out.

Right, we are just checking the existence of columns here. We lock it
later on with the understanding that the columns may have disappeared in
the meantime, in which case they will simply be skipped. I've added a

> + foreach(relation, relations)
> + {
> + if (((RelationAndColumns *) lfirst(relation))->va_cols != NIL &&
> + !(options & VACOPT_ANALYZE))
> Saving the data in a variable makes for a better style and easier
> debugging. When doing a bitwise operation, please use as well != 0 or
> == 0 as you are looking here for a boolean result.
> + relinfo = makeNode(RelationAndColumns);
> + relinfo->oids = list_make1_oid(HeapTupleGetOid(tuple));
> + *vacrels = lappend(*vacrels, relinfo);
> Assigning variables even for nothing is good practice for readability,
> and shows the intention behind the code.

I've made these changes.

> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
> - * the RangeVar is used. (The latter must always be passed, because it's
> - * used for error messages.)
> + * If we intend to process all relations, the 'relations' argument may be
> + * NIL.
> This comment actually applies to RelationAndColumns. If the OID is
> invalid, then RangeVar is used, and should always be set. You are
> breaking that promise actually for autovacuum. The comment here should
> say that if relations is NIL all the relations of the database are
> processes, and for an ANALYZE all the columns are done.

Makes sense, I've tried to make this comment clearer.



Attachment Content-Type Size
vacuum_multiple_tables_v8.patch application/octet-stream 34.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-08-24 03:50:45 Re: proposal: psql command \graw
Previous Message Robert Haas 2017-08-24 02:43:41 Re: [PATCH] Push limit to sort through a subquery