Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
Date: 2018-01-17 23:50:32
Message-ID: 1904.1516233032@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> [ pg_dump-and-pg_dumpall-database-handling-refactoring_v12.patch ]

I've gone through this in a once-over-lightly fashion. Since there was
quite a bit of debate upthread about how things should work, I'd like
to start by summarizing the decisions this patch has made, in case
anyone still wants to object to them:

* pg_dump will now be responsible for dumping all per-database properties,
including "ALTER ROLE IN DATABASE SET" settings. To get that behavior,
pg_dumpall will invoke pg_dump using -C, or using the new switch
--set-db-properties when working on "template1" or "postgres" databases,
since those should already exist in the target installation.

* pg_dumpall still won't dump template0 (or any other not-datallowconn
database). This means that it will no longer propagate any non-default
per-database properties for such databases. I think this is fine for
template0: if you've messed with that at all, you are fooling with non
user-serviceable parts. It's a bit less good for other DBs maybe, but on
the whole it seems more consistent to simply treat nonconnectable DBs
as not existing. (That could stand to be documented though.)

* "pg_dumpall -g" will now produce only role- and tablespace-related output.

* There isn't any direct way to ask pg_dump for "just the DB properties
and nothing else" (although I suppose you can mostly fake it by adding
"--schema nosuchschema" or some such). This makes it inconvenient to
exactly duplicate the old behavior of "pg_dumpall -g", if someone wanted
to do that to fit into their existing backup procedures. I'm not sure
how important that is. Personally I'm content to lose it, but if there's
enough pushback maybe we could invent a "--db-properties-only" switch?

There are some other points that haven't been debated:

* As the patch stands, --set-db-properties is implicit when you specify
-C, and in fact the patch goes to the trouble of throwing an error if you
try to specify both switches. I'm inclined to think this might be a bad
idea. What about saying that -C enables emitting CREATE DATABASE and
reconnecting to that DB, and independently of that, --set-db-properties
enables emitting the additional per-database properties? This seems
simpler to understand, more flexible, and less of a change from the
previous behavior of -C. On the other hand you could argue that people
would always want --set-db-properties with -C and so we're merely
promoting carpal tunnel (and errors of omission) if we do it like that.
If so, maybe we could say "-C implies --set-db-properties by default, but
if you really don't want that, you can say -C --no-set-db-properties".
Perhaps the only application of this is to reproduce pg_dump's historical
behavior, but that's probably of some value.

* The patch fails to make any provision for deciding at pg_restore time
whether to emit DB properties. Considering that you can use -C at restore
time, I think it's pretty awful that you can't use --set-db-properties
then. Moreover, not only has the patch not added a separate "DATABASE
PROPERTIES" TOC entry type as was originally proposed, what it's actually
doing is emitting two identically labeled "DATABASE" TOC entries, one
with the CREATE and the other with the rest. That's simply horrid.
I think we need to do this properly and emit two distinguishable TOC
entries, and control which of them get printed on the restore side of the
logic, not the TOC entry construction side. (I'm not sure at this point
if we need an archive version bump to do that, but perhaps not --- in the
past we've added new TOC types without a version bump.)

* Some years ago, commit 4bd371f6f caused pg_dumpall to emit "SET
default_transaction_read_only = off;" in its scripts, so that it could
successfully handle DBs that have "SET default_transaction_read_only = on"
as a database property. This was intentionally NOT done to pg_dump,
arguing that doing so would greatly weaken such a setting as a defense
against stupid errors (i.e. restoring into your production database).
The patch currently ignores that reasoning and moves the setting into
pg_dump anyway. Haribabu mentioned this point upthread and got basically
no response, but I'm concerned about it.

I think however that we might be able to dodge that issue as a byproduct
of fixing the previous problem. If the CREATE DATABASE is done as one
TOC entry, and we reconnect to the target DB after processing that entry,
and then we process the DATABASE PROPERTIES entry, then any DB property
settings we've installed will not apply in our existing session. So I
think we can just leave out the "SET default_transaction_read_only = on"
altogether. Obviously this puts a premium on not reconnecting, but
reconnecting would break other more important features like
--single-transaction, so I am not worried about that.

* An issue in connection with that is that because you can't issue
ALTER DATABASE SET TABLESPACE against the current DB, pg_dumpall
currently takes the trouble to explicitly reconnect to a different DB
and then back to the target DB when issuing that command. That doesn't
play nice with my suggestion above, nor with --single-transaction.
I'm still thinking about how to fix that, but probably the best fix
involves handling non-default tablespaces by including them in the
initial CREATE DATABASE command, plus emitting an ALTER DATABASE SET
TABLESPACE in a separate new "DATABASE TABLESPACE" TOC object, which
we would have to teach the pg_restore logic to handle correctly.
At minimum it'd need to be aware about the interaction with
--single-transaction; since we have to have that anyway, we could also
make it do the extra-reconnections dance instead of hard-wiring \connect
commands into the text of the entry, and/or skip the whole thing if it
knows it emitted the CREATE DATABASE command.

Comments? If there's not objections I plan to push forward on this.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-01-18 00:00:49 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Tomas Vondra 2018-01-17 23:48:49 Re: [HACKERS] GnuTLS support