Re: A few new options for vacuumdb

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A few new options for vacuumdb
Date: 2019-01-08 18:46:11
Message-ID: FF67D651-A7AF-4E70-9401-805E3CFB4DC9@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/7/19, 1:12 AM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> I have been looking at the patch set, and 0001 can actually happen
> only once 0005 is applied because this modifies the query doing on
> HEAD a full scan of pg_class which would include at least catalog
> tables so it can never be empty. For this reason it seems to me that
> 0001 and 0004 should be merged together as common refactoring, making
> sure on the way that all relations exist before processing anything.
> As 0005 and 0006 change similar portions of the code, we could also
> merge these together in a second patch introducing the new options.
> Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
> would likely merge things when they make sense together to reduce
> diff chunks.

Thanks for reviewing. Sure, I can merge these together so that it's
just 2 patches.

> 0002 removes some carefully-written query generation, so as it is
> never possible to generate something like ANALYZE FULL. HEAD splits
> ANALYZE and VACUUM clearly, but 0002 merges them together so as
> careless coding in vacuumdb.c makes it easier to generate command
> strings which would fail in the backend relying on the option checks
> to make sure that for example combining --full and --analyze-only
> never happens. Introducing 0002 is mainly here for 0003, so let's
> merge them together.

Makes sense. I was trying to simplify this code a bit, but I agree
with your point about relying on the option checks.

> From patch 0004:
> + executeCommand(conn, "RESET search_path;", progname, echo);
> + res = executeQuery(conn, catalog_query.data, progname, echo);
> + termPQExpBuffer(&catalog_query);
> + PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL,
> + progname, echo));
> We should really avoid that. There are other things like casts which
> tend to be forgotten, and if the catalog lookup query gets more
> complicated, I feel that this would again be forgotten, reintroducing
> the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.

This was done in order to maintain the current behavior that
appendQualifiedRelation() gives us. I found that skipping the
search_path handling here forced us to specify the schema in the
argument for --table in most cases. At the very least, I could add a
comment here to highlight the importance of fully qualifying
everything in the catalog query. What do you think?

> I have put my hands into what you sent, and worked on putting
> 0002/0003 first into shape, finishing with the attached. I have
> simplified the query construction a bit, making it IMO easier to read
> and to add new options, with comments documenting what's supported
> across versions. I have also added a regression test for
> --analyze-only --skip-locked. Then I have done some edit of the docs.
> What do you think for this portion?

Looks good to me, except for one small thing in the documentation:

+ <para>
+ Disable all page-skipping behavior during processing based on
+ the visibility map, similarly to the option
+ <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>.
+ </para>

I think the "similarly to the option" part is slightly misleading.
It's not just similar, it _is_ using that option in the generated
commands. Perhaps we could point to the VACUUM documentation for more
information about this one.

On 1/7/19, 8:03 PM, "Michael Paquier" <michael(at)paquier(dot)xyz> wrote:
> On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
>> 0002 and 0003 are merged and posted by Michael-san and it looks good
>> to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
>> is a few review comments.
>
> I have done another round on 0002/0003 (PQfinish was lacking after
> error-ing on incompatible options) and committed the thing.

Thanks!

>> Even if all tables are filtered by --min-relation-size, min-mxid-age
>> or min-xid-age the following message appeared.
>>
>> $ vacuumdb --verbose --min-relation-size 1000000 postgres
>> vacuumdb: vacuuming database "postgres"
>>
>> Since no tables are processed in this case isn't is better not to
>> output this message by moving the message after checking if ntup ==
>> 0?
>
> Hmm. My take on this one is that we attempt to vacuum relations on
> this database, but this may finish by doing nothing. Printing this
> message is still important to let the user know that an attempt was
> done so I would keep the message.

+1 for keeping the message.

>> You use pg_total_relation_size() to check the relation size but it
>> might be better to use pg_relation_size() instead. The
>> pg_total_relation_size() includes the all index size but I think it's
>> common based on my experience that we check only the table size
>> (including toast table) to do vacuum it or not.
>
> Yes, I am also not completely sure if the way things are defined in
> the patch are completely what we are looking for. Still, I have seen
> as well people complain about the total amount of space a table is
> physically taking on disk, including its indexes. So from the user
> experience the direction taken by the patch makes sense, as much as
> does the argument you do.

Good point. I think allowing multiple different relation size options
here would be confusing, too (e.g. --min-relation-size versus
--min-total-relation-size). IMO pg_total_relation_size() is the way
to go here, as we'll most likely need to process the indexes and TOAST
tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for
VACUUM, we'd then want to use pg_table_size() when --min-relation-size
and --disable-index-cleanup are used together in vacuumdb. Thoughts?

I've realized that I forgot to add links to the XID/MXID wraparound
documentation like you suggested a while back. I'll make sure that
gets included in the next patch set, too.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-08 19:21:14 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message John Naylor 2019-01-08 18:41:16 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)