Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
Date: 2018-01-21 17:42:11
Message-ID: 24388.1516556531@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been hacking away at this patch, and attached is what I've got
so far. I think this is committable, but if anyone wants to do
further review and testing, that'd be fine.

Per discussion, I got rid of the separate --set-db-properties switch:
additional database properties are now applied if and only if --create is
specified. But the DATABASE TOC entry itself still contains only CREATE
DATABASE; the additional commands are carried in a "DATABASE PROPERTIES"
TOC entry so that they will be issued after reconnecting to the new DB.
This dodges the "ALTER DATABASE SET default_transaction_read_only"
problem. (Furthermore, it turns out that we have to do it like that
because pg_restore issues any one TOC entry's contents as a single PQexec.
If you try to cram anything else in with the CREATE DATABASE, then the
server spits up because CREATE DATABASE becomes part of an implicit
transaction block.)

I also fixed it so that a database's comment, security label, and ACL are
restored only when saying --create. This is different from the previous
behavior for DB comments and security labels, but it seems a great deal
more useful and consistent. In no other case would pg_dump/pg_restore
attempt to restore a comment, security label, or ACL for an object it
hadn't created, so why should it act differently for databases?

Worth noting here is that if we accept that behavior, the problem for
which "COMMENT ON CURRENT_DATABASE" was proposed goes away, because
there's no longer a risk of trying to apply COMMENT ON DATABASE to the
wrong database. We might still want that patch as a standalone feature,
but pg_dump no longer needs it.

Another point worth commenting on (this change was also in the v13 patch)
is that pg_dumpall formerly took some pains to omit the encoding and
locale specifications in its CREATE DATABASE commands, for databases whose
settings matched the cluster default. This new implementation won't do
that. We could change it to do so, but then a standalone "pg_dump
--create" would also act that way, which is not really easy to defend.
IIRC, the argument for pg_dumpall doing it like that was to make it
less painful to migrate a cluster to a new machine that might not have the
identical set of locales, or if one wanted to migrate everything to a new
encoding. But those are not mainstream use-cases, and if you really need
to do that you can still get there by dumping databases individually
without using --create. On the other hand, there are obvious gotchas
involved in letting a dump/reload silently change to another locale or
encoding. So I think this is an improvement overall, but it bears
noting as a behavioral change.

Another point to note is that I dithered about whether to bump the
pg_dump archive version number, which would have the effect of preventing
pre-v11 versions of pg_restore from processing dumps made by this version.
The argument for breaking archive compatibility is that older pg_restore
versions will not know that it'd be a good idea to skip DATABASE
PROPERTIES TOC entries and database ACL entries if not using --create.
However, in default cases there won't be a DATABASE PROPERTIES entry to
skip. Moreover, applying these entries unconditionally isn't that much
different conceptually from applying database comments or sec labels
unconditionally, as such older pg_restore versions would do anyway.
It also seems like if your back were against the wall, being able to
read a newer archive file with an older pg_restore is better than being
locked out of it completely. So I'm leaning to no version bump, but
it could still be discussed.

One thing we could possibly use here is more regression test cases.
The only existing regression test that's affected by this patch is
002_pg_dump.pl's expectations about which cases will print a database
comment, so it seems like we're missing some behavioral coverage.
Not sure what that should look like though.

This patch has to be applied over the patches proposed in
https://www.postgresql.org/message-id/21714.1516553459%40sss.pgh.pa.us
Aside from touching some of the same code, this is dependent on
the changes made there to make comment, seclabel, and ACL entries
reliably identifiable.

As far as notable code changes go, I got rid of the previous patch's
move of executeQuery() into dumputils.c. That had some undesirable
knock-on effects in terms of creating even more coupling between
different modules (through the g_verbose global), and it was basically
misguided anyway. pg_dump executes queries via ExecuteSqlQuery; we
do not need a few of its functions to be using low-level routines with
behavior different from that.

If anyone wants to do further review on this, let me know; otherwise
I'll push it in a day or so.

regards, tom lane

Attachment Content-Type Size
pg_dump-and-pg_dumpall-refactor-v14.patch text/x-diff 66.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-01-21 18:02:38 Re: [HACKERS] Supporting huge pages on Windows
Previous Message Tom Lane 2018-01-21 16:50:59 Bogus tags for comments, ACLs, and security labels in pg_dump