Skip site navigation (1) Skip section navigation (2)

Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Dunstan <pgsql(at)tomd(dot)cc>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 13:05:11
Message-ID: 1204031111.4252.255.camel@ebony.site (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, 2008-02-26 at 18:19 +0530, Tom Dunstan wrote:
> On Tue, Feb 26, 2008 at 5:35 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2008-02-26 at 12:46 +0100, Dimitri Fontaine wrote:
> >  > As a user I'd really prefer all of this to be much more transparent, and could
> >  > well imagine the -Fc format to be some kind of TOC + zip of table data + post
> >  > load instructions (organized per table), or something like this.
> >  > In fact just what you described, all embedded in a single file.
> >
> >  If its in a single file then it won't perform as well as if its separate
> >  files. We can put separate files on separate drives. We can begin
> >  reloading one table while another is still unloading. The OS will
> >  perform readahead for us on single files whereas on one file it will
> >  look like random I/O. etc.
> 
> Yeah, writing multiple unknown-length streams to a single file in
> parallel is going to be all kinds of painful, and this use case seems
> to be the biggest complaint against a zip file kind of approach. I
> didn't know about the custom file format when I suggested the zip file
> one yesterday*, but a zip or equivalent has the major benefit of
> allowing the user to do manual inspection / tweaking of the dump
> because the file format is one that can be manipulated by standard
> tools. And zip wins over tar because it's indexed - if you want to
> extract just the schema and hack on it you don't need to touch your
> multi-GBs of data.
> 
> Perhaps a compromise: we specify a file system layout for table data
> files, pre/post scripts and other metadata that we want to be made
> available to pg_restore. By default, it gets dumped into a zip file /
> whatever, but a user who wants to get parallel unloads can pass a flag
> that tells pg_dump to stick it into a directory instead, with exactly
> the same file layout. Or how about this: if the filename given to
> pg_dump is a directory, spit out files in there, otherwise
> create/overwrite a single file.
> 
> While it's a bit fiddly, putting data on separate drives would then
> involve something like symlinking the tablename inside the dump dir
> off to an appropriate mount point, but that's probably not much worse
> than running n different pg_dump commands specifying different files.
> Heck, if you've got lots of data and want very particular behavior,
> you've got to specify it somehow. :)

Separate files seems much simpler...

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


In response to

Responses

pgsql-hackers by date

Next:From: mac_man2005Date: 2008-02-26 13:15:00
Subject: Re: [HACKERS] 2WRS [WIP]
Previous:From: Zeugswetter Andreas ADI SDDate: 2008-02-26 13:01:08
Subject: Re: pg_dump additional options for performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group