Re: pg_dump additional options for performance

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-20 21:43:35
Message-ID: 20080720214335.GB18846@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon,

* Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:
> > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
[...]
> > > - Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

> > > - Documentation
> > > When writing the documentation I would stress that "pre-schema" and
> > > "post-schema" be defined in terms of PostgreSQL objects and why they
> > > are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are. I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do. Would you consider:

<term><option>--schema-pre-load</option></term>
<listitem>
<para>
Pre-Data Load - Minimum amount of the schema required before data
loading can begin. This consists mainly of creating the tables
using the <command>CREATE TABLE</command>.
This part can be requested using <option>--schema-pre-load</>.
</para>
</listitem>

<term><option>--schema-post-load</option></term>
<listitem>
<para>
Post-Data Load - The rest of the schema definition, including keys,
indexes, etc. By putting keys and indexes after the data has been
loaded the whole process of restoring data is much faster. This is
because it is faster to build indexes and check keys in bulk than
piecemeal as the data is loaded.
This part can be requested using <option>--schema-post-load</>.
</para>
</listitem>

Even this doesn't cover everything though- it's too focused on tables
and data loading. Where do functions go? What about types?

A couple of additional points:

- The command-line help hasn't been updated. Clearly, that also needs
to be done to consider the documentation aspect complete.

- There appears to be a bit of mistakenly included additions. The
patch to pg_restore.sgml attempts to add in documentation for
--superuser. I'm guessing that was unintentional, and looks like
just a mistaken extra copy&paste.

> > > - Technically, the patch needs to be updated slightly since another
> > > pg_dump-related patch was committed recently which also added
> > > options and thus causes a conflict.

I think this might have just happened again, funny enough. It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-07-20 21:59:40 Re: pg_dump additional options for performance
Previous Message Tom Lane 2008-07-20 19:05:27 Re: CommitFest dragging?

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2008-07-20 21:59:40 Re: pg_dump additional options for performance
Previous Message Tom Lane 2008-07-20 18:50:50 Re: pg_dump lock timeout