Re: pg_amcheck contrib application

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-03-11 16:09:52
Message-ID: CA+TgmoakvqTJNCSXQ0Y+pMBa33Gmf2o53WGg816eFC7ju3MWRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 10, 2021 at 11:02 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > The documentation says that -D "does exclude any database that was
> > listed explicitly as dbname on the command line, nor does it exclude
> > the database chosen in the absence of any dbname argument." The first
> > part of this makes complete sense to me, but I'm not sure about the
> > second part. If I type pg_amcheck --all -D 'r*', I think I'm expecting
> > that "rhaas" won't be checked. Likewise, if I say pg_amcheck -d
> > 'bob*', I think I only want to check the bob-related databases and not
> > rhaas.
>
> I think it's a tricky definitional problem. I'll argue the other side for the moment:
>
> If you say `pg_amcheck bob`, I think it is fair to assume that "bob" gets checked. If you say `pg_amcheck bob -d="b*" -D="bo*"`, it is fair to expect all databases starting with /b/ to be checked, except those starting with /bo/, except that since you *explicitly* asked for "bob", that "bob" gets checked. We both agree on this point, I think.

+1.

> If you say `pg_amcheck --maintenance-db=bob -d="b*" -D="bo*", you don't expect "bob" to get checked, even though it was explicitly stated.

I expect that specifying --maintenance-db has zero effect on what gets
checked. The only thing that should do is tell me which database to
use to get the list of databases that I am going to check, just in
case the default is unsuitable and will fail.

> If you are named "bob", and run `pg_amcheck`, you expect it to get your name "bob" from the environment, and check database "bob". It's implicit rather than explicit, but that doesn't change what you expect to happen. It's just a short-hand for saying `pg_amcheck bob`.

+1.

> Saying that `pg_amcheck -d="b*" -D="bo*" should not check "bob" implies that the database being retrieved from the environment is acting like a maintenance-db. But that's not how it is treated when you just say `pg_amcheck` with no arguments. I think treating it as a maintenance-db in some situations but not in others is strangely non-orthogonal.

I don't think I agree with this. A maintenance DB in my mind doesn't
mean "a database we're not actually checking," but rather "a database
that we're using to get a list of other databases."

TBH, I guess I actually don't know why we ever treat a bare
command-line argument as a maintenance DB. I probably wouldn't do
that. We should only need a maintenance DB if we need to query for a
list of database to check, and if the user has explicitly named the
database to check, then we do not need to do that... unless they've
also done something like -D or -d, but then the explicitly-specified
database name is playing a double role. It is both one of the
databases we will check, and also the database we will use to figure
out what other databases to check. I think that's why this seems
non-orthogonal.

Here's my proposal:

1. If there are options present which require querying for a list of
databases (e.g. --all, -d, -D) then use connectMaintenanceDatabase()
and go figure out what they mean. The cparams passed to that function
are only affected by the use of --maintenance-db, not by any bare
command line arguments. If there are no arguments present which
require querying for a list of databases, then --maintenance-db has no
effect.

2. If there is a bare command line argument, add the named database to
the list of databases to be checked. This might be empty if no
relevant options were specified in step 1, or if those options matched
nothing. It might be a noop if the named database was already selected
by the options mentioned in step 1.

3. If there were no options present which required querying for a list
of databases, and if there is also no bare command line argument, then
default to the checking whatever database we connect to by default.

With this approach, --maintenance-db only ever affects how we get the
list of databases to check, and a bare command-line argument only ever
specifies a database to be checked. That seems cleaner.

An alternate possibility would be to say that there should only ever
be EITHER a bare command-line argument OR options that require
querying for a list of databases OR neither BUT NOT both. Then it's
simple:

0. If you have both options which require querying for a list of
databases and also a bare database name, error and die.
1. As above.
2. As above except the only possibility is now increasing the list of
target databases from length 0 to length 1.
3. As above.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-03-11 16:26:06 Re: WIP: BRIN multi-range indexes
Previous Message David G. Johnston 2021-03-11 16:08:47 Re: documentation fix for SET ROLE