Re: pg_dump additional options for performance

From: "Tom Dunstan" <pgsql(at)tomd(dot)cc>
To: "Jochem van Dieten" <jochemd(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-02-25 09:06:09
Message-ID: ca33c0a30802250106w5c9d2b8ey56ea358c6c950f43@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 24, 2008 at 6:52 PM, Jochem van Dieten <jochemd(at)gmail(dot)com> wrote:
> Or we could have a switch that specifies a directory and have pg_dump
> split the dump not just in pre-schema, data and post-schema, but also
> split the data in a file for each table. That would greatly facilitate
> a parallel restore of the data through multiple connections.

<delurk>

I'll admit to thinking something similar while reading this thread,
mostly because having to specify multiple filenames just to do a dump
and then do them all on the way back in seemed horrible. My idea was
to stick the multiple streams into a structured container file rather
than a directory though - a zip file a la JAR/ODF leapt to mind. That
has the nice property of being a single dump file with optional built
in compression that could store all the data as separate streams and
would allow a smart restore program to do as much in parallel as makes
sense. Mucking around with directories or three different filenames or
whatever is a pain. I'll bet most users want to say "pg_dump
--dump-file=foo.zip foo", back up foo.zip as appropriate, and when
restoring saying "pg_restore --dump-file=foo.zip -j 4" or whatever and
having pg_restore do the rest. The other nice thing about using a zip
file as a container is that you can inspect it with standard tools if
you need to.

Another thought is that doing things this way would allow us to add
extra metadata to the dump in later versions without giving the user
yet another command line switch for an extra file. Or even, thinking a
bit more outside the box, allow us to store data in binary format if
that's what the user wants at some point (thinking of the output from
binary io rather than on disk representation, obviously). Exposing all
the internals of this stuff via n command line args is pretty
constraining - it would be nice if pg_dump just produced the most
efficient dump, and if we decide at a later date that that means doing
things a bit differently, then we bump the dump file version and just
do it.

Just a thought...

Cheers

Tom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Saito 2008-02-25 09:44:07 Re: OSSP can be used in the windows environment now!
Previous Message Tom Lane 2008-02-25 06:22:07 Re: insert ... delete ... returning ... ?