Re: pg14 psql broke \d datname.nspname.relname

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg14 psql broke \d datname.nspname.relname
Date: 2021-10-13 20:43:53
Message-ID: F9EC168B-1A29-421E-BB07-AE1C4D38152F@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 13, 2021, at 8:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Oct 13, 2021 at 10:40 AM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> 3a is a bit strange, when considered in the context of patterns. If db1, db2, and db3 all exist and each have a table foo.bar, and psql is connected to db1, how should the command \d db?.foo.bar behave? We have no problem with db1.foo.bar, but we do have problems with the other two. If the answer is to complain about the databases that are unconnected, consider what happens if the user writes this in a script when only db1 exists, and later the script stops working because somebody created database db2. Maybe that's not completely horrible, but surely it is less than ideal.
>>
>> 3b is what pg_amcheck does. It accepts database.schema.relname, and it will complain if no matching database/schema/relation can be found (unless --no-strict-names was given.)
>
> Well, like I said, we can't treat a part that's purportedly a DB name
> as a pattern, so when connected to db1, I presume the command \d
> db?.foo.bar would have to behave just like \d
> dskjlglsghdksgdjkshg.foo.bar. I suppose technically I'm wrong: db?
> could be matched against the list of database names as a pattern, and
> then we could complain only if it doesn't match exactly and only the
> current DB. But I don't like adding a bunch of extra code to
> accomplish nothing useful, so if we're going to match it all I think
> it should just strcmp().
>
> But I'm still not sure what the best thing to do overall is here.

The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider what happens with:

pg_dump -t production.critical.secrets > secrets.dump
dropdb production

In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data) as production, you are unhappy. You just dumped a copy of your test data and blew away the production data.

You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the pattern /^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were found". Neither behavior seems correct.

The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. All three callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. I propose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insert the database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where <database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~), but for now we lack the sufficient logic for handling multiple matching database names. (The situation is different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database, and we'd still be fine matching that against a pattern.)

For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob characters nor POSIX regex characters, and return an error code if any are found, so that the clients can raise an appropriate error to the user.

In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's behavior. You wouldn't get an error about having specified the wrong database. You'd just get no matching relations. \d ??db??.schema.table would complain about the db portion being a pattern. \d "??db??".schema.table would work, assuming you're connected to a database literally named ??db??

In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simply trying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db?? would work to exclude four character database names beginning in "db".

In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching tables were found". pg_dump -t too.many.dotted.names would give a different error about too many parts. pg_dump -t db??.foo.bar would give an error about the database needing to be a literal name rather than a pattern.

I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest of the pattern is treated, including in how encoding settings might interact with the name, needing to be executed on the client side rather than in the server where the rest of the name resolution is happening.

Does this sound like a workable proposal?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-10-13 20:46:53 Re: prevent immature WAL streaming
Previous Message Alvaro Herrera 2021-10-13 20:42:37 Re: prevent immature WAL streaming