Skip site navigation (1) Skip section navigation (2)

Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 15:15:59
Message-ID: 1204038959.4252.321.camel@ebony.site (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, 2008-02-26 at 10:03 -0500, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > On Tue, Feb 26, 2008 at 12:39:29AM -0500, Tom Lane wrote:
> >> BTW, what exactly was the use-case for this?
> 
> > One use-case would be when you have to make some small change to the schema
> > while reloading it, that's still compatible with the data format. Then
> > you'd dump schema-no-indexes-and-stuff, then *edit* that file, before
> > reloading things. It's a lot easier to edit the file if it's not hundreds
> > of gigabytes..
> 
> This is a use-case for having switches that *extract* convenient subsets
> of a dump archive.  It does not mandate having pg_dump emit multiple
> files.  You could extract, say, the pre-data schema into a text SQL
> script, edit it, load it, then extract the data and remainining script
> directly into the database from the dump file.
> 
> In short, what I think we need here is just some more conveniently
> defined extraction filter switches than --schema-only and --data-only.
> There's no need for any fundamental change to pg_dump's architecture.

> Yes, I've read the subsequent discussion about a "directory" output
> format.  I think it's pointless complication --- or at least, that it's
> a performance hack rather than a functionality one, with no chance of
> any actual performance gain until we've parallelized pg_restore, and
> with zero existing evidence that any gain would be had even then.
> 
> BTW, if we avoid fooling with the definition of the archive format,
> that also means that the extraction-switch patch should be relatively
> independent of parallelization work, so the work could proceed
> concurrently.

So if I understand:

* we add switches to pg_dump to dump out separate files with --pre,
--post and --data (or other names) [TODO: Simon]

* we add switches to pg_restore to load/dump from the single archive
file the subsets of --pre, --post, --data [TODO: Magnus]

Everybody agree?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


In response to

Responses

pgsql-hackers by date

Next:From: David BOURIAUDDate: 2008-02-26 15:23:18
Subject: Re: pg_dump additional options for performance
Previous:From: Andrew DunstanDate: 2008-02-26 15:15:39
Subject: Re: pg_dump additional options for performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group