Re: pg_dump additional options for performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-26 16:20:04
Message-ID: 23540.1217089204@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
>> The key problem is that pg_restore is broken:

> The key capability here is being able to split the dump into multiple
> pieces. The equivalent capability on restore is *not* required, because
> once the dump has been split the restore never needs to be.

This argument is nonsense. The typical usage of this capability, IMHO,
will be

pg_dump -Fc >whole.dump
pg_restore --schema-before-data whole.dump >before.sql
pg_restore --data-only whole.dump >data.sql
pg_restore --schema-after-data whole.dump >after.sql

followed by editing the schema pieces and then loading. One reason
is that this gives you a consistent dump, whereas three successive
pg_dump runs could never guarantee any such thing. Another reason
is that you may well not know when you prepare the dump that you
will need split output, because the requirement to edit the dump
is likely to be realized only when you go to load it.

In any case, why did you put the switches into pg_restore.c if you
thought it wasn't useful for pg_restore to handle them?

>> Not so easy to fix
>> is that COMMENTs might be either before or after data depending on what
>> kind of object they are attached to.

> Is there anything to fix?

Well, yeah. If you attach a comment to an after-data object and test
--schema-after-data, you'll notice the comment is lost.

>> And there's yet another issue here, which is that it's not entirely clear
>> that the type of an object uniquely determines whether it's before or
>> after data.

> Don't understand that. Objects are sorted in well-defined order,
> specified in pg_dump_sort.

After which we do a topological sort that enforces dependency ordering.
The question to worry about is whether there can ever be a dependency
from a normally-"before" object to a normally-"after" object, which
would cause the dependency sort to move the latter in front of the
former (in one way or another). I'm not saying that any such case can
occur today, but I don't think it's an impossibility for it to arise in
future. I don't want this relatively minor feature to be putting limits
on what kinds of dependencies the system can have.

> I'm conscious that the major work proposed will take weeks to complete

I don't think that what I am proposing is that complicated; I would
anticipate it requiring somewhere on the order of two dozen lines of
code. I was thinking of doing a preliminary loop through the TocEntry
list to identify the ordinal numbers of the first and last data items,
and then the main loop could compare a counter to those numbers to
decide which of the three sections it was in. Plus you'd need another
ArchiveEntry call someplace to prepare the "dummy data" item if one was
needed.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-26 16:24:07 Re: pg_dump additional options for performance
Previous Message Kenneth Marshall 2008-07-26 16:06:21 Re: [RFC] Unsigned integer support.

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-07-26 16:24:07 Re: pg_dump additional options for performance
Previous Message Stephen Frost 2008-07-26 15:11:10 Re: pg_dump additional options for performance