Re: pg_dump additional options for performance

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-26 18:03:23
Message-ID: 1217095404.16378.184.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sat, 2008-07-26 at 13:43 -0400, Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:

> > I dislike, and doubt that I'd use, this approach. At the end of the
> > day, it ends up processing the same (very large amount of data) multiple
> > times.
>
> Well, that's easily avoided: just replace the third step by restoring
> directly to the target database.
>
> pg_restore --schema-before-data whole.dump >before.sql
> edit before.sql
> pg_restore --schema-after-data whole.dump >after.sql
> edit after.sql
> psql -f before.sql target_db
> pg_restore --data-only -d target_db whole.dump
> psql -f after.sql target_db

It seems to me we continue to hack a solution without a clear idea of
the problems involved. There are a number of what I would consider
significant issues with the backup / restore facilities as a whole with
PostgreSQL.

1. We use text based backups, even with custom format. We need a fast
binary representation as well.

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

pg_backup (not dump, we aren't dumping. Dumping is usually associated
with some sort of crash or fould human behavoir. We are backing up).
pg_backup -U <user> -D database -F -f mybackup.sqlc

If I were to extract <mybackup.sqlc> I would get:

mybackup.datatypes
mybackup.tables
mybackup.data
mybackup.primary_keys
mybackup.indexes
mybackup.constraints
mybackup.grants

All would be the SQL representation.

Further I could do this:

pg_restore -U <user> -D <database> --data-types -f mybackup.sqlc

Which would restore just the SQL representation of the data types.

Or:

pg_restore -U <user> -D <database> --tables -f mybackup.sqlc

Which would restore *only* the tables. Yes it would error if I didn't
also specify --data-types.

Further we need to have concurrency capability. Once we have restored
datatypes and tables, there is zero reason not to launch connections on
data (and then primary keys and indexes) so:

pg_restore -U <user> -D <database> -C 4 --full -f mybackup.sqlc

Which would launch four connections to the database, and perform a full
restore per mybackup.sqlc.

Oh and pg_dumpall? It should have been removed right around the release
of 7.2, pg_dump -A please.

Anyway, I leave other peeps to flame me into oblivion.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2008-07-26 20:05:36 Re: WITH RECUSIVE patches 0723
Previous Message Tom Lane 2008-07-26 17:56:14 Re: pg_dump additional options for performance

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Gierth 2008-07-26 20:05:36 Re: WITH RECUSIVE patches 0723
Previous Message Tom Lane 2008-07-26 17:56:14 Re: pg_dump additional options for performance