Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Dunstan <pgsql(at)tomd(dot)cc>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 13:05:11
Message-ID: 1204031111.4252.255.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2008-02-26 at 18:19 +0530, Tom Dunstan wrote:
> On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote:
> > > As a user I'd really prefer all of this to be much more transparent, and could
> > > well imagine the -Fc format to be some kind of TOC + zip of table data + post
> > > load instructions (organized per table), or something like this.
> > > In fact just what you described, all embedded in a single file.
> >
> > If its in a single file then it won't perform as well as if its separate
> > files. We can put separate files on separate drives. We can begin
> > reloading one table while another is still unloading. The OS will
> > perform readahead for us on single files whereas on one file it will
> > look like random I/O. etc.
>
> Yeah, writing multiple unknown-length streams to a single file in
> parallel is going to be all kinds of painful, and this use case seems
> to be the biggest complaint against a zip file kind of approach. I
> didn't know about the custom file format when I suggested the zip file
> one yesterday*, but a zip or equivalent has the major benefit of
> allowing the user to do manual inspection / tweaking of the dump
> because the file format is one that can be manipulated by standard
> tools. And zip wins over tar because it's indexed - if you want to
> extract just the schema and hack on it you don't need to touch your
> multi-GBs of data.
>
> Perhaps a compromise: we specify a file system layout for table data
> files, pre/post scripts and other metadata that we want to be made
> available to pg_restore. By default, it gets dumped into a zip file /
> whatever, but a user who wants to get parallel unloads can pass a flag
> that tells pg_dump to stick it into a directory instead, with exactly
> the same file layout. Or how about this: if the filename given to
> pg_dump is a directory, spit out files in there, otherwise
> create/overwrite a single file.
>
> While it's a bit fiddly, putting data on separate drives would then
> involve something like symlinking the tablename inside the dump dir
> off to an appropriate mount point, but that's probably not much worse
> than running n different pg_dump commands specifying different files.
> Heck, if you've got lots of data and want very particular behavior,
> you've got to specify it somehow. :)

Separate files seems much simpler...

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mac_man2005 2008-02-26 13:15:00 Re: [HACKERS] 2WRS [WIP]
Previous Message Zeugswetter Andreas ADI SD 2008-02-26 13:01:08 Re: pg_dump additional options for performance