Skip site navigation (1) Skip section navigation (2)

Re: pg_dump additional options for performance

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-26 18:03:23
Message-ID: 1217095404.16378.184.camel@jd-laptop (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Sat, 2008-07-26 at 13:43 -0400, Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:

> > I dislike, and doubt that I'd use, this approach.  At the end of the
> > day, it ends up processing the same (very large amount of data) multiple
> > times.
> 
> Well, that's easily avoided: just replace the third step by restoring
> directly to the target database.
> 
> pg_restore --schema-before-data whole.dump >before.sql
> edit before.sql
> pg_restore --schema-after-data whole.dump >after.sql
> edit after.sql
> psql -f before.sql target_db
> pg_restore --data-only -d target_db whole.dump
> psql -f after.sql target_db

It seems to me we continue to hack a solution without a clear idea of
the problems involved. There are a number of what I would consider
significant issues with the backup / restore facilities as a whole with
PostgreSQL.

1. We use text based backups, even with custom format. We need a fast
binary representation as well.

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

pg_backup (not dump, we aren't dumping. Dumping is usually associated
with some sort of crash or fould human behavoir. We are backing up).
   pg_backup -U <user> -D database -F -f mybackup.sqlc

If I were to extract <mybackup.sqlc> I would get:

  mybackup.datatypes
  mybackup.tables
  mybackup.data
  mybackup.primary_keys
  mybackup.indexes
  mybackup.constraints
  mybackup.grants

All would be the SQL representation.

Further I could do this:

   pg_restore -U <user> -D <database> --data-types -f mybackup.sqlc

Which would restore just the SQL representation of the data types.

Or:

   pg_restore -U <user> -D <database> --tables -f mybackup.sqlc

Which would restore *only* the tables. Yes it would error if I didn't
also specify --data-types.

Further we need to have concurrency capability. Once we have restored
datatypes and tables, there is zero reason not to launch connections on
data (and then primary keys and indexes) so:

   pg_restore -U <user> -D <database> -C 4 --full -f mybackup.sqlc

Which would launch four connections to the database, and perform a full
restore per mybackup.sqlc.

Oh and pg_dumpall? It should have been removed right around the release
of 7.2, pg_dump -A please.

Anyway, I leave other peeps to flame me into oblivion.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




In response to

Responses

pgsql-hackers by date

Next:From: Andrew GierthDate: 2008-07-26 20:05:36
Subject: Re: WITH RECUSIVE patches 0723
Previous:From: Tom LaneDate: 2008-07-26 17:56:14
Subject: Re: pg_dump additional options for performance

pgsql-patches by date

Next:From: Andrew GierthDate: 2008-07-26 20:05:36
Subject: Re: WITH RECUSIVE patches 0723
Previous:From: Tom LaneDate: 2008-07-26 17:56:14
Subject: Re: pg_dump additional options for performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group