pg_dumpall --clean versus roles and shared dependencies

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_dumpall --clean versus roles and shared dependencies
Date: 2005-07-31 17:51:33
Message-ID: 29417.1122832293@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been trying to figure out what to do about pg_dumpall's --clean
option in view of our recent changes. The problem is that pg_dumpall
tries to delete existing users and groups by putting this in its
output script:

DELETE FROM pg_shadow
WHERE usesysid <> (SELECT datdba FROM pg_database
WHERE datname = 'template0');
DELETE FROM pg_group;

CVS tip of course will just respond to these with
ERROR: cannot delete from a view
HINT: You need an unconditional ON DELETE DO INSTEAD rule.

So we have two problems: what do we want 8.1 pg_dumpall to do instead,
and what are we going to do about legacy pg_dump scripts that already
contain these commands?

A couple of relevant points:

* Nowhere else do pg_dump and pg_dumpall interpret --clean as a license
for a scorched-earth policy; rather, it means "drop the specific objects
you are going to re-create". One could argue that these commands are
therefore wrong by design, and what we should emit instead is DROP ROLE
commands for just the individual roles we are going to create.

* In view of the shared-dependencies patch, it is *highly* likely that
some or all of the deletions would fail anyway, due to the users owning
objects or permissions that haven't been deleted (yet). We could reduce
the risk of this by emitting DROP DATABASE commands before the DROP ROLE
commands, but of course this doesn't fix things if there are additional
databases in the target installation.

I am strongly tempted to propose that --clean is wrongheaded when it
comes to roles, and that pg_dumpall should just always emit all role
information in the style
CREATE ROLE foo;
ALTER ROLE foo WITH ... options ... ;
which will have the effect of ensuring that the role exists with all the
desired settings whether it pre-existed or not.

Role membership data is a different story. Part of the effect of
DELETE FROM pg_group;
was to eliminate group membership data as well as the groups themselves.
As of CVS tip, I have the code doing this instead:
DELETE FROM pg_auth_members;
followed by GRANT commands to restore membership links. I do not like
this solution though. In the first place, it is still following a
scorched-earth policy, which will completely mess up any pre-existing
groups in the destination installation, even (or especially) if they
are unrelated to what the dump script is loading. In the second place,
this is certainly failing to learn from experience: we should not have
the output scripts presuming such familiarity with system catalogs of
future Postgres releases.

One possibility is to invent a "REVOKE role FROM *" kind of command
and have --clean mode issue that for each role being reloaded.

This still leaves us with the question of "what about the DELETE
commands in existing scripts?". I thought about adding rules and
triggers to try to make those operations do approximately what is
intended, but I fear it's a hopeless cause: because of shared
dependencies there is little or no likelihood that *all* of the
individual user drops will succeed, and since they would necessarily
be happening in a single transaction, that means none of them will.
So my current idea is to just ignore the problem: those commands
won't do what is intended but they should be relatively harmless.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-07-31 19:15:35 Re: [HACKERS] Autovacuum loose ends
Previous Message Alvaro Herrera 2005-07-31 17:02:33 Re: [HACKERS] Autovacuum loose ends