Quick Links

Re: pg_dump additional options for performance

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pg_dump additional options for performance
Date:	2008-02-26 15:03:57
Message-ID:	2284.1204038237@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Feb 26, 2008 at 12:39:29AM -0500, Tom Lane wrote:
>> BTW, what exactly was the use-case for this?

> One use-case would be when you have to make some small change to the schema
> while reloading it, that's still compatible with the data format. Then
> you'd dump schema-no-indexes-and-stuff, then *edit* that file, before
> reloading things. It's a lot easier to edit the file if it's not hundreds
> of gigabytes..

This is a use-case for having switches that *extract* convenient subsets
of a dump archive. It does not mandate having pg_dump emit multiple
files. You could extract, say, the pre-data schema into a text SQL
script, edit it, load it, then extract the data and remainining script
directly into the database from the dump file.

In short, what I think we need here is just some more conveniently
defined extraction filter switches than --schema-only and --data-only.
There's no need for any fundamental change to pg_dump's architecture.

Yes, I've read the subsequent discussion about a "directory" output
format. I think it's pointless complication --- or at least, that it's
a performance hack rather than a functionality one, with no chance of
any actual performance gain until we've parallelized pg_restore, and
with zero existing evidence that any gain would be had even then.

BTW, if we avoid fooling with the definition of the archive format,
that also means that the extraction-switch patch should be relatively
independent of parallelization work, so the work could proceed
concurrently.

regards, tom lane

In response to

Re: pg_dump additional options for performance at 2008-02-26 11:31:38 from Magnus Hagander

Responses

Re: pg_dump additional options for performance at 2008-02-26 15:15:39 from Andrew Dunstan
Re: pg_dump additional options for performance at 2008-02-26 15:15:59 from Simon Riggs
Re: pg_dump additional options for performance at 2008-02-26 15:23:18 from David BOURIAUD

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	manolo.espa	2008-02-26 15:10:38	Re: [HACKERS] 2WRS [WIP]
Previous Message	Simon Riggs	2008-02-26 14:46:38	Bulk loading performance improvements