On Tue, 2008-02-26 at 18:19 +0530, Tom Dunstan wrote:
> On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote:
> > > As a user I'd really prefer all of this to be much more transparent, and could
> > > well imagine the -Fc format to be some kind of TOC + zip of table data + post
> > > load instructions (organized per table), or something like this.
> > > In fact just what you described, all embedded in a single file.
> > If its in a single file then it won't perform as well as if its separate
> > files. We can put separate files on separate drives. We can begin
> > reloading one table while another is still unloading. The OS will
> > perform readahead for us on single files whereas on one file it will
> > look like random I/O. etc.
> Yeah, writing multiple unknown-length streams to a single file in
> parallel is going to be all kinds of painful, and this use case seems
> to be the biggest complaint against a zip file kind of approach. I
> didn't know about the custom file format when I suggested the zip file
> one yesterday*, but a zip or equivalent has the major benefit of
> allowing the user to do manual inspection / tweaking of the dump
> because the file format is one that can be manipulated by standard
> tools. And zip wins over tar because it's indexed - if you want to
> extract just the schema and hack on it you don't need to touch your
> multi-GBs of data.
> Perhaps a compromise: we specify a file system layout for table data
> files, pre/post scripts and other metadata that we want to be made
> available to pg_restore. By default, it gets dumped into a zip file /
> whatever, but a user who wants to get parallel unloads can pass a flag
> that tells pg_dump to stick it into a directory instead, with exactly
> the same file layout. Or how about this: if the filename given to
> pg_dump is a directory, spit out files in there, otherwise
> create/overwrite a single file.
> While it's a bit fiddly, putting data on separate drives would then
> involve something like symlinking the tablename inside the dump dir
> off to an appropriate mount point, but that's probably not much worse
> than running n different pg_dump commands specifying different files.
> Heck, if you've got lots of data and want very particular behavior,
> you've got to specify it somehow. :)
Separate files seems much simpler...
In response to
pgsql-hackers by date
|Next:||From: mac_man2005||Date: 2008-02-26 13:15:00|
|Subject: Re: [HACKERS] 2WRS [WIP]|
|Previous:||From: Zeugswetter Andreas ADI SD||Date: 2008-02-26 13:01:08|
|Subject: Re: pg_dump additional options for performance|