pg_dump and pg_dumpall in real life

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: rafael(at)postgresql(dot)org(dot)es
Subject: pg_dump and pg_dumpall in real life
Date: 2013-11-11 13:59:10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hash: SHA1


After some discussions in #pgconfeu, this is an attempt to relaunch
the discussion about how pg_dump and pg_dumpall work and the
challenges they give us in real life.

We have got bitten sometimes because of their behavior and we can see
it is a difficult subject for new postgres users even if they have
long experience with others databases.

Logical backups are used for restores or cloning purposes. If as a
database administrator you want to be able to do one of these
procedures for a database, you need to do this in advance today:

1) Use pg_dump to dump schema, data and privileges of the database.
2) Use pg_dumpall -g to dump global objects
3) Use pg_dumpall -g to dump ALTER ROLE ... SET ... data
4) Use pg_dumpall to dump possible ALTER DATABASE ... SET ... data

For a cloning procedure we need the samme steps but usually and in
addition we have to change the name of the owner/database when
importing the dumps.

If you have just a few and not very complicated databases in your
cluster, these steps will not be very complicated although very

Imagine you have several hundred databases in your cluster, with
several hundred users owning some objects and with grants in others.

Imagine you are cloning or restoring only one or a few of these
databases to another server. For 2), 3) and 4) you will have to parse
the output from pg_dumpall to get the few global objects, ALTER ROLE
and ALTER DATABASE data for the few databases you are restoring.

In addition, if you have used "GRANT .. ON .. TO ..." in your database
objects you will have to take care of this manually to find out who
has extra privileges in your objects, so you can also get the right
information from pg_dumpall -g.

You don't need a lot of imagination to understand what a mess this can
be when moving data around. Not to talk about the possibility of doing
something wrong in the process and not be a very robust solution.

After many years of using pg_dump/pg_dumpall, this is our experience,
our wishes and thoughts:

* We need a pg_dump solution that can generate in one step all the
necessary pieces of information needed when restoring or cloning a
database. (schema, data, privileges, users and alter database/role data)

* It would be great to be able to tell pg_restore that user1 in the
dump will became user2 in the restored/cloned database. The same for
the name of the database.

* For serious backup management of large and complicated databases,
pg_dump with the custom output + pg_restore is the only feasible solution.

What do you think about the subject? Does it sound like a reasonable
proposition? What do we need to implement some of these changes?

Thanks in advance for your time.

Some background information:


- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key:
Version: GnuPG v2.0.14 (GNU/Linux)



Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-11-11 14:03:45 Re: TABLE not synonymous with SELECT * FROM?
Previous Message Colin 't Hart 2013-11-11 13:50:10 Re: Execute query with EXCEPT, INTERSECT as anti-join, join?