Re: pg_dump additional options for performance

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Tom Dunstan" <pgsql(at)tomd(dot)cc>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 13:38:06
Message-ID: 200802261438.08519.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le mardi 26 février 2008, Tom Dunstan a écrit :
> On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > Le mardi 26 février 2008, Dimitri Fontaine a écrit :
> >> We could even support some option for the user to tell us which
> >> disk arrays to use for parallel dumping.
> >>
> >> pg_dump -j2 --dumpto=/mount/sda:/mount/sdb ... > mydb.dump
> >> pg_restore -j4 ... mydb.dump
> >
> > If its in a single file then it won't perform as well as if its separate
> > files. We can put separate files on separate drives. We can begin
> > reloading one table while another is still unloading. The OS will
> > perform readahead for us on single files whereas on one file it will
> > look like random I/O. etc.
>
> Yeah, writing multiple unknown-length streams to a single file in
> parallel is going to be all kinds of painful
[...]
> While it's a bit fiddly, putting data on separate drives would then
> involve something like symlinking the tablename inside the dump dir
> off to an appropriate mount point, but that's probably not much worse
> than running n different pg_dump commands specifying different files.
> Heck, if you've got lots of data and want very particular behavior,
> you've got to specify it somehow. :)

What I meant with the --dumpto=/mount/sda:/mount/sdb idea was that pg_dump
would unload data to those dirs (filesystems/disk array/whatever) then
prepare the final zip file from here.

We could even choose for the --dumpto option to associate each entry to a
process, or have a special TOC syntax which allows for complex setups, and
have pg_dump dump first a TOC you edit, then use the edited version to
control the parallel unloading, disks to use for which tables, etc.

That is exactly your ideas, but with a try to make them appear clear and
simple from a user point of view, so with some more work to get done by the
tools.
--
dim

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Roberts, Jon 2008-02-26 13:42:18 pgAgent job limit
Previous Message Alvaro Herrera 2008-02-26 13:35:15 Re: Reference by in \d <table_name> out