Re: pg14 psql broke \d datname.nspname.relname

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg14 psql broke \d datname.nspname.relname
Date: 2021-10-20 14:15:25
Message-ID: 300422FF-60E3-4EF1-BE8B-16D302B2CE6C@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider what happens with:
>
> pg_dump -t production.critical.secrets > secrets.dump
> dropdb production
>
> In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data) as production, you are unhappy. You just dumped a copy of your test data and blew away the production data.
>
> You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the pattern /^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were found". Neither behavior seems correct.

With the attached patch, this scenario results in a "cross-database references are not implemented" error.

> The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. All three callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. I propose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insert the database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where <database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~), but for now we lack the sufficient logic for handling multiple matching database names. (The situation is different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database, and we'd still be fine matching that against a pattern.)

I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as rejecting the database name as part of the query complicates the calling convention for no apparent benefit. I had been concerned about database names that were collation-wise equal but byte-wise unequal, but it seems we already treat those as distinct database names, so my concern was unnecessary. We already use strcmp on database names from frontend clients (fe_utils/parallel_slots.c, psql/prompt.c, pg_amcheck.c, pg_dump.c, pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend (commands/dbcommands.c, init/postinit.c).

I tried testing how this plays out by handing `createdb` the name é (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and then again the name é (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".) That results in two distinct databases, not an error about a duplicate database name:

# select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é');
oid | datname | datdba | encoding | datcollate | datctype
-------+---------+--------+----------+-------------+-------------
37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8
(2 rows)

But that doesn't seem to prove much, as other tools in my locale don't treat those as equal either. (Testing with perl's "eq" operator, they compare as distinct.) I expected to find regression tests providing better coverage for this somewhere, but did not. Anybody know more about it?

> For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob characters nor POSIX regex characters, and return an error code if any are found, so that the clients can raise an appropriate error to the user.

With the patch, using pattern characters in an unquoted database portion results in a "database name must be literal" error. Using them in a quoted database name is allowed, but unless you are connected to a database that literally equals that name, you will get a "cross-database references are not implemented" error.

> In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's behavior. You wouldn't get an error about having specified the wrong database. You'd just get no matching relations. \d ??db??.schema.table would complain about the db portion being a pattern. \d "??db??".schema.table would work, assuming you're connected to a database literally named ??db??

With the patch, psql will treat \d wrongdb.schema.table as a "cross-database references are not implemented" error.

> In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simply trying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db?? would work to exclude four character database names beginning in "db".

The patch implements this.

> In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching tables were found".

With the patch, pg_dump instead gives a "cross-database references are not implemented" error.

> pg_dump -t too.many.dotted.names would give a different error about too many parts.

With the patch, pg_dump instead gives a "improper qualified name (too many dotted names)" error.

> pg_dump -t db??.foo.bar would give an error about the database needing to be a literal name rather than a pattern.

With the patch, pg_dump gives a "database name must be literal" error. This is the only new error message in the patch, which puts a burden on translators, but I didn't see any existing message that would serve. Suggestions welcome.

> I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest of the pattern is treated, including in how encoding settings might interact with the name, needing to be executed on the client side rather than in the server where the rest of the name resolution is happening.

Recanted, as discussed above.

The patch only changes the behavior of pg_amcheck in that it now rejects patterns with too many parts. Using database patterns was and remains legal for this tool.

The patch changes nothing about reindexdb. That's a debatable design choice, but reindexdb doesn't use string_utils's processSQLNamePattern() function as the other tools do, nor does its documentation reference psql's #APP-PSQL-PATTERNS documentation. It's --schema option only takes literal names.

Attachment Content-Type Size
v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch application/octet-stream 76.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anna Akenteva 2021-10-20 14:53:02 Some questions about schema privileges
Previous Message Andrew Dunstan 2021-10-20 13:53:38 Re: cursor use vs pg_stat_statements