Re: [Proposal] Allow users to specify multiple tables in VACUUM commands

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Date: 2017-08-29 04:26:03
Message-ID: CAB7nPqSC5AuFxRA6Gg+yo9EJ4nDAXuT1_uaNCTyuZQFiDCVoNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 26, 2017 at 8:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> What happens if you say VACUUM partitioned_table (a), some_partition (b)?

Using v9, if you do that:
=# CREATE TABLE parent (id int) PARTITION BY RANGE (id);
CREATE TABLE
=# CREATE TABLE child_1_10001 partition of parent for values from (1)
to (10001);
CREATE TABLE
=# CREATE TABLE child_10001_20001 partition of parent for values from
(10001) to (20001);
CREATE TABLE
=# insert into parent values (generate_series(1,20000));
INSERT 0 20000

Vacuuming the parent vacuums all the children, so any child listed
would get vacuumed twice, still this does not cause an error:
=# vacuum parent, child_10001_20000;
VACUUM
And with the de-duplication patch on top of it, things are vacuumed only once.

On Tue, Aug 29, 2017 at 7:56 AM, Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> On 8/23/17, 11:59 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
>> + * relations is a list of VacuumRelations to process. If it is NIL, all
>> + * relations in the database are processed.
>> Typo here, VacuumRelation is singular.
>
> This should be fixed in v9.
>
> On 8/24/17, 5:45 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
>> This makes me think that it could be a good idea to revisit this bit
>> in a separate patch. ANALYZE fails as well now when the same column is
>> defined multiple times with an incomprehensible error message.
>
> The de-duplication code is now in a separate patch,
> dedupe_vacuum_relations_v1.patch. I believe it fixes the incomprehensible
> error message you were experiencing, but please let me know if you are
> still hitting it.

It looks that problems in this area are fixed using the second patch.

> On 8/25/17, 6:00 PM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
>> + oldcontext = MemoryContextSwitchTo(vac_context);
>> + foreach(lc, relations)
>> + temp_relations = lappend(temp_relations, copyObject(lfirst(lc)));
>> + MemoryContextSwitchTo(oldcontext);
>> + relations = temp_relations;
>>
>> Can't we just copyObject() on the whole list?
>
> I've made this change.
>
>> - ListCell *cur;
>> -
>>
>> Why change this? Generally, declaring a separate variable in an inner
>> scope seems like better style than reusing one that happens to be
>> lying around in the outer scope.
>
> I've removed this change.
>
>> + VacuumRelation *relinfo = (VacuumRelation *) lfirst(lc);
>>
>> Could use lfirst_node.
>
> Done.
>
> On 8/28/17, 5:28 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Yes, if I understand that correctly. That's the point I am exactly
>> coming at. My suggestion is to have one VacuumRelation entry per
>> relation vacuumed, even for partitioned tables, and copy the list of
>> columns to each one.
>
> I've made this change in v9. It does clean up the patch quite a bit.

Here is some input for vacuum_multiple_tables_v9, about which I think
that we are getting to something committable. Here are some minor
comments.

<para>
- With no parameter, <command>VACUUM</command> processes every table in the
+ With no parameters, <command>VACUUM</command> processes every table in the
current database that the current user has permission to vacuum.
- With a parameter, <command>VACUUM</command> processes only that table.
+ With parameters, <command>VACUUM</command> processes only the tables
+ specified.
</para>
The part about parameters looks fine to me if unchanged.

+ foreach(lc, relations)
+ {
+ VacuumRelation *relation = lfirst_node(VacuumRelation, lc);
+ if (relation->va_cols != NIL && (options & VACOPT_ANALYZE) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ANALYZE option must be specified when a
column list is provided")));
+ }
Could you add a hint with the relation name involved here? When many
relations are defined in the VACUUM query this would be useful for the
user.

+ tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ classForm = (Form_pg_class) GETSTRUCT(tuple);
+ include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
+ ReleaseSysCache(tuple);
It pains me to see that get_rel_relkind does not return an error if
the relation is missing, we could use it here. I would welcome a
refactoring with a missing_ok argument a lot! Now this patch for
VACUUM does not justify breaking potentially many extensions...

+ relinfo = makeNode(VacuumRelation);
+ rel_oid = HeapTupleGetOid(tuple);
+ relinfo->oid = rel_oid;
There are 4 patterns like that in the patch. We could make use of a
makeVacuumRelation.

About the de-duplication patch, I have to admit that I am still not a
fan of doing such a thing. Another road that we could take is to
simply complain with a proper error message if:
- the same column name is specified twice for a relation.
- the same relation is defined twice. In the case of partitions, we
could track the fact that it is already listed as part of a parent,
though perhaps it does not seem worth the extra CPU cost especially
when there are multiple nesting levels with partitions.
Autovacuum has also the advantage, if I recall correctly, to select
all columns for analyze, and skip parent partitions when scanning for
relations so that's a safe bet from this side. Opinions welcome.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-08-29 05:08:23 Re: Proposal: Improve bitmap costing for lossy pages
Previous Message Mithun Cy 2017-08-29 03:14:28 Re: NoMovementScanDirection in heapgettup() and heapgettup_pagemode()