Re: A few new options for vacuumdb

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A few new options for vacuumdb
Date: 2019-01-23 03:41:11
Message-ID: 20190123034111.GH3873@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 22, 2019 at 11:21:32PM +0000, Bossart, Nathan wrote:
> On 1/21/19, 10:08 PM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> > On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote:
> >> Here's a new patch set that should address the feedback in this
> >> thread. The changes in this version include:
> >>
> >> - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
> >> documentation. My suggestion is to keep it short and simple like
> >> --full, --freeze, --skip-locked, --verbose, and --analyze. The
> >> DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
> >> documentation, and IMO it is reasonably obvious that such vacuumdb
> >> options correspond to the VACUUM options.
> >
> > Okay, committed this one.
>
> Thanks.
>
> > Regarding the search_path business, there is actually similar business
> > in expand_table_name_patterns() for pg_dump if you look close by, so
> > as users calling --table don't have to specify the schema, and just
> > stick with patterns.
>
> I see. The "WHERE c.relkind..." clause looks a bit different than
> what I have, so I've altered vacuumdb's catalog query to match. This
> was changed in expand_table_name_patterns() as part of 582edc369cd for
> a security fix, so we should probably do something similar here.
>
> > + /*
> > + * Prepare the list of tables to process by querying the catalogs.
> > + *
> > + * Since we execute the constructed query with the default search_path
> > + * (which could be unsafe), it is important to fully qualify everything.
> > + */
> > It is not only important, but also absolutely mandatory, so let's make
> > the comment insisting harder on that point. From this point of view,
> > the query that you are building is visibly correct.
>
> I've updated the comment as suggested.
>
> > + appendStringLiteralConn(&catalog_query, just_table, conn);
> > + appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n");
> > +
> > + pg_free(just_table);
> > +
> > + cell = cell->next;
> > + if (cell == NULL)
> > + appendPQExpBuffer(&catalog_query, " )\n");
> > Hm. It seems to me that you are duplicating what
> > processSQLNamePattern() does, so we ought to use it. And it is
> > possible to control the generation of the different WHERE clauses with
> > a single query based on the number of elements in the list. Perhaps I
> > am missing something? It looks unnecessary to export
> > split_table_columns_spec() as well.
>
> I'm sorry, I don't quite follow this. AFAICT processSQLNamePattern()
> is for generating WHERE clauses based on a wildcard-pattern string for
> psql and pg_dump. This portion of the patch is generating a WHERE
> clause to filter only for the tables listed via --table, and I don't
> think the --table option currently accepts patterns. Since you can
> also provide a list of columns with the --table option, I am using
> split_table_columns_spec() to retrieve only the table name for the OID
> lookup.

Bah, of course you are right. vacuumdb does not support patterns but
I thought it was able to handle that. With column names that would be
sort of funny anyway.

>> Or we could just use "ANALYZE \.;/", perhaps patching it first.
>> Perhaps my regexp here is incorrect, but I just mean to check for an
>> ANALYZE query which begins by "ANALYZE " and finishes with ";",
>> without caring about what is in-between. This would make the tests
>> more portable.
>
> Sure, I've split this out into a separate patch.

Thanks, I have committed this part to ease the follow-up work.

> I think it's already pretty unlikely that any matching relations would
> be found, but I've updated these values to be within the range where
> any matching database has already stopped accepting commands to
> prevent wraparound.

Thanks.

I have been looking at 0002, and I found a problem with the way
ANALYZE queries are generated. For example take this table:
CREATE TABLE aa1 (a int);

Then if I try to run ANALYZE with vacuumdb it just works:
$ vacuumdb -z --table 'aa1(b)'
vacuumdb: vacuuming database "ioltas"

Note that this fails with HEAD, but passes with your patch. The
problem is that the query generated misses the lists of columns when
processing them through split_table_columns_spec(), as what is
generated is that:
VACUUM (ANALYZE) public.aa1;

So the result is actually incorrect because all the columns get
processed.

This patch moves the check about the existence of the relation when
querying the catalogs, perhaps we would want the same for columns for
consistency? Or not. That's a bit harder for sure, not impossible
visibly, still that would mean duplicating a bunch of logic that the
backend is doing by itself, so we could live without it I think.

Attached is a first patch to add more tests in this area, which passes
on HEAD but fails with your patch. It looks like a good idea to tweak
the tests first as there is no coverage yet for the queries generated
with complete and partial column lists.

At the same time, we may want to change split_table_columns_spec's
name to be more consistent with the other APIs, like
splitTableColumnspec. That's a nit though..
--
Michael

Attachment Content-Type Size
vacuumdb-test-cols.patch text/x-diff 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-01-23 04:09:30 Re: Undo worker and transaction rollback
Previous Message Michael Paquier 2019-01-23 03:01:10 Re: Typo: llvm*.cpp files identified as llvm*.c