Re: pg_dump additional options for performance

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 15:28:17
Message-ID: 20080226152817.GT528@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 26, 2008 at 03:15:59PM +0000, Simon Riggs wrote:
> On Tue, 2008-02-26 at 10:03 -0500, Tom Lane wrote:
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > > On Tue, Feb 26, 2008 at 12:39:29AM -0500, Tom Lane wrote:
> > >> BTW, what exactly was the use-case for this?
> >
> > > One use-case would be when you have to make some small change to the schema
> > > while reloading it, that's still compatible with the data format. Then
> > > you'd dump schema-no-indexes-and-stuff, then *edit* that file, before
> > > reloading things. It's a lot easier to edit the file if it's not hundreds
> > > of gigabytes..
> >
> > This is a use-case for having switches that *extract* convenient subsets
> > of a dump archive. It does not mandate having pg_dump emit multiple
> > files. You could extract, say, the pre-data schema into a text SQL
> > script, edit it, load it, then extract the data and remainining script
> > directly into the database from the dump file.
> >
> > In short, what I think we need here is just some more conveniently
> > defined extraction filter switches than --schema-only and --data-only.
> > There's no need for any fundamental change to pg_dump's architecture.
>
> > Yes, I've read the subsequent discussion about a "directory" output
> > format. I think it's pointless complication --- or at least, that it's
> > a performance hack rather than a functionality one, with no chance of
> > any actual performance gain until we've parallelized pg_restore, and
> > with zero existing evidence that any gain would be had even then.
> >
> > BTW, if we avoid fooling with the definition of the archive format,
> > that also means that the extraction-switch patch should be relatively
> > independent of parallelization work, so the work could proceed
> > concurrently.
>
> So if I understand:
>
> * we add switches to pg_dump to dump out separate files with --pre,
> --post and --data (or other names) [TODO: Simon]
>
> * we add switches to pg_restore to load/dump from the single archive
> file the subsets of --pre, --post, --data [TODO: Magnus]

Not entirely sure how I rended up volounteering for that one ;-) And
frankly, if we're going to add the same switches with the same meaning to
both, it's probably more efficient if the same person does them both...

//Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-02-26 15:31:40 Re: [COMMITTERS] pgsql: Link postgres from all object files at once, to avoid the
Previous Message Gregory Stark 2008-02-26 15:25:27 Re: Bulk loading performance improvements