Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-26 09:05:04
Message-ID: 1217063104.3894.1100.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Fri, 2008-07-25 at 19:16 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > [ pg_dump_beforeafter.v6.patch ]

> Unfortunately there's still a lot of work to do, and I don't feel like
> doing it so I'm bouncing this patch back for further work.

Fair enough. Thanks for the review.

> The key problem is that pg_restore is broken: it emits nearly the same
> output for --schema-before-data and --schema-after-data, because it
> doesn't have any way to distinguish which objects in a full dump file
> belong where. This is because the filtering logic was put in the wrong
> place, namely in the ArchiveEntry creation routines in pg_dump.c, when
> where it really needs to happen is while scanning the TocEntry list in
> RestoreArchive(). (Note: it is perhaps worth keeping the pg_dump.c
> filters so as to avoid doing extra server queries for objects that we
> aren't going to dump anyway, but the core logic has to be in
> RestoreArchive.)

My feeling is that this would take the patch off-track.

The key capability here is being able to split the dump into multiple
pieces. The equivalent capability on restore is *not* required, because
once the dump has been split the restore never needs to be. It might
seem that the patch should be symmetrical with respect to pg_dump and
pg_restore, but I see no use case for the pg_restore case.

The title of this email confirms that as original intention.

> I looked over this patch a bit. I have a proposal for a slightly
> different way of defining the new switches:
>
> * --schema-before-data, --data-only, and --schema-after-data can be
> specified in any combination to obtain any subset of the full dump.
> If none are specified (which would in itself be a useless combination)
> then the default is to dump all three sections, just as if all three
> were specified.
>
> * --schema-only is defined as equivalent to specifying both
> --schema-before-data and --schema-after-data.
>
> The patch as submitted enforces what seem largely arbitrary restrictions
> on combining these switches. It made some sense before to treat
> specifying both --schema-only and --data-only as an error, but it's not
> clear to me why you shouldn't be able to write both --schema-before-data
> and --schema-after-data, especially when there's a switch right beside
> them that appears to be equivalent to that combination. So let's just
> allow all the combinations.

I had it both ways at various points in development. I'm happy with what
you propose.

> The attached updated patch implements and documents this behavior,
> and gets rid of the special linkage between --disable-triggers and
> --data-only as previously discussed.

OK

> Another issue is that the rules for deciding which objects are "before
> data" and which are "after data" are wrong. In particular ACLs are after
> data not before data, which is relatively easy to fix.

OK

> Not so easy to fix
> is that COMMENTs might be either before or after data depending on what
> kind of object they are attached to.

Is there anything to fix? Comments are added by calls to dumpComment,
which are always made in conjunction with the dump of an object. So if
you dump the object you dump the comment. As long as objects are
correctly split out then comments will be also.

> (BTW, what about BLOB COMMENTS? They definitely can't be "before data".
> ISTM you could make a case for them being "after data", if you think that
> comments are always schema. But there is also a case for considering
> them as data, because the objects they are attached to are data. I kind
> of like the latter approach because it would create an invariant that
> comments appear in the same dump section as the object commented on.
> Thoughts?)

Yes, data. I'll look at this.

> Implementing the filtering by examining the type of a TocEntry in
> RestoreArchive is a bit of a PITA, but it's probably possible. The
> main bad thing about that is the need to make an explicit list of every
> type of TocEntry that exists now or ever has been emitted by any past
> version of pg_dump. The design concept was that the type tags are
> mainly documentation, and while we've had to bend that in places (mostly
> for backward-compatibility reasons) this would be the first place we'd
> have to throw it overboard completely.
>
> And there's yet another issue here, which is that it's not entirely clear
> that the type of an object uniquely determines whether it's before or
> after data. This might be an emergent property of the object sorting
> rules, but there is certainly not anything positively guaranteeing that
> the dependency-driven topological sort will produce such a result, and
> especially not that that will always be true in the future. So the
> approach seems a bit fragile.

Don't understand that. Objects are sorted in well-defined order,
specified in pg_dump_sort.c. Essentially we are saying that (according
to current numbering)

--schema-before-data priority 1-8
--data-only priority 9-11
--schema-after-data priority 12+

So the sort is explicitly defined, not implicit. I can add comments to
ensure that people changing the priority of objects across those
boundaries would be causing problems.

> We could perhaps get rid of that problem, as well as the need to implement
> object-type-determination logic, if we were to make RestoreArchive define
> the groupings according to position in the TocEntry list: everything
> before the first TABLE DATA or BLOB (and BLOB COMMENT?) entry is "before"
> data, everything after the last one is "after" data, everything in between
> is data. Then we only need to identify object types that are considered
> "data", which we already have a rule for (whether hadDumper is true).
> This is pretty attractive until you stop to consider the possibility
> that there aren't any data entries in an archive (ie, it was made with
> --schema-only): then there's no way to identify the boundary points.
>
> We could solve that problem by inserting a "dummy data" TOC entry where
> the data would have appeared, but this will only work in new archive
> files. With an implementation like this, pg_restore with
> --schema-before-data or --schema-after-data won't behave very nicely on a
> pre-8.4 --schema-only archive file. (Presumably it would act as though
> all the objects were "before" data.) Is that a small enough corner case
> to live with in order to gain implementation simplicity and robustness?

All of the above makes me certain I want to remove these options from
pg_restore.

> BTW, another incomplete item is that pg_dumpall should probably be taught
> to accept and pass down --schema-before-data and --schema-after-data
> switches.

OK

I'm conscious that the major work proposed will take weeks to complete
and we don't know what other problems it will cause (but I'm pretty
certain it will cause some). With regard to the use case, I see little
or no benefit from either of us doing that and regret I won't be able to
complete that.

Can we prune down to the base use case to avoid this overhead? i.e. have
these options on pg_dump only?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-07-26 09:08:31 Re: Transaction-controlled robustness for replication
Previous Message Markus Wanner 2008-07-26 08:17:18 Re: Transaction-controlled robustness for replication

Browse pgsql-patches by date

  From Date Subject
Next Message Stephen Frost 2008-07-26 15:11:10 Re: pg_dump additional options for performance
Previous Message Tatsuo Ishii 2008-07-26 03:23:46 Re: [PATCHES] WITH RECUSIVE patches 0723