Re: pg_dump additional options for performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-25 23:16:24
Message-ID: 2818.1217027784@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> [ pg_dump_beforeafter.v6.patch ]

I looked over this patch a bit. I have a proposal for a slightly
different way of defining the new switches:

* --schema-before-data, --data-only, and --schema-after-data can be
specified in any combination to obtain any subset of the full dump.
If none are specified (which would in itself be a useless combination)
then the default is to dump all three sections, just as if all three
were specified.

* --schema-only is defined as equivalent to specifying both
--schema-before-data and --schema-after-data.

The patch as submitted enforces what seem largely arbitrary restrictions
on combining these switches. It made some sense before to treat
specifying both --schema-only and --data-only as an error, but it's not
clear to me why you shouldn't be able to write both --schema-before-data
and --schema-after-data, especially when there's a switch right beside
them that appears to be equivalent to that combination. So let's just
allow all the combinations.

The attached updated patch implements and documents this behavior,
and gets rid of the special linkage between --disable-triggers and
--data-only as previously discussed.

Unfortunately there's still a lot of work to do, and I don't feel like
doing it so I'm bouncing this patch back for further work.

The key problem is that pg_restore is broken: it emits nearly the same
output for --schema-before-data and --schema-after-data, because it
doesn't have any way to distinguish which objects in a full dump file
belong where. This is because the filtering logic was put in the wrong
place, namely in the ArchiveEntry creation routines in pg_dump.c, when
where it really needs to happen is while scanning the TocEntry list in
RestoreArchive(). (Note: it is perhaps worth keeping the pg_dump.c
filters so as to avoid doing extra server queries for objects that we
aren't going to dump anyway, but the core logic has to be in
RestoreArchive.)

Another issue is that the rules for deciding which objects are "before
data" and which are "after data" are wrong. In particular ACLs are after
data not before data, which is relatively easy to fix. Not so easy to fix
is that COMMENTs might be either before or after data depending on what
kind of object they are attached to.

(BTW, what about BLOB COMMENTS? They definitely can't be "before data".
ISTM you could make a case for them being "after data", if you think that
comments are always schema. But there is also a case for considering
them as data, because the objects they are attached to are data. I kind
of like the latter approach because it would create an invariant that
comments appear in the same dump section as the object commented on.
Thoughts?)

Implementing the filtering by examining the type of a TocEntry in
RestoreArchive is a bit of a PITA, but it's probably possible. The
main bad thing about that is the need to make an explicit list of every
type of TocEntry that exists now or ever has been emitted by any past
version of pg_dump. The design concept was that the type tags are
mainly documentation, and while we've had to bend that in places (mostly
for backward-compatibility reasons) this would be the first place we'd
have to throw it overboard completely.

And there's yet another issue here, which is that it's not entirely clear
that the type of an object uniquely determines whether it's before or
after data. This might be an emergent property of the object sorting
rules, but there is certainly not anything positively guaranteeing that
the dependency-driven topological sort will produce such a result, and
especially not that that will always be true in the future. So the
approach seems a bit fragile.

We could perhaps get rid of that problem, as well as the need to implement
object-type-determination logic, if we were to make RestoreArchive define
the groupings according to position in the TocEntry list: everything
before the first TABLE DATA or BLOB (and BLOB COMMENT?) entry is "before"
data, everything after the last one is "after" data, everything in between
is data. Then we only need to identify object types that are considered
"data", which we already have a rule for (whether hadDumper is true).
This is pretty attractive until you stop to consider the possibility
that there aren't any data entries in an archive (ie, it was made with
--schema-only): then there's no way to identify the boundary points.

We could solve that problem by inserting a "dummy data" TOC entry where
the data would have appeared, but this will only work in new archive
files. With an implementation like this, pg_restore with
--schema-before-data or --schema-after-data won't behave very nicely on a
pre-8.4 --schema-only archive file. (Presumably it would act as though
all the objects were "before" data.) Is that a small enough corner case
to live with in order to gain implementation simplicity and robustness?

BTW, another incomplete item is that pg_dumpall should probably be taught
to accept and pass down --schema-before-data and --schema-after-data
switches.

regards, tom lane

Attachment Content-Type Size
pg_dump_beforeafter.v7.patch.gz application/octet-stream 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2008-07-25 23:31:09 Re: Research/Implementation of Nested Loop Join optimization
Previous Message Gregory Stark 2008-07-25 22:17:20 Re: Adding WHERE clause to pg_dump

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Sabino Mullane 2008-07-26 02:39:34 Re: pg_dump additional options for performance
Previous Message Simon Riggs 2008-07-25 21:58:27 WIP: Transportable Optimizer Mode