Re: pg_dump additional options for performance

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-24 03:20:47
Message-ID: 20080724032047.GI18846@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon,

* Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> ...and with command line help also.

The documentation and whatnot looks good to me now. There are a couple
of other issues I found while looking through and testing the patch
though-

Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.497
diff -c -r1.497 pg_dump.c
*** src/bin/pg_dump/pg_dump.c 20 Jul 2008 18:43:30 -0000 1.497
--- src/bin/pg_dump/pg_dump.c 23 Jul 2008 17:04:24 -0000
***************
*** 225,232 ****
RestoreOptions *ropt;

static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;

static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
--- 229,238 ----
RestoreOptions *ropt;

static int disable_triggers = 0;
! static int outputNoTablespaces = 0;
static int use_setsessauth = 0;
+ static int use_schemaBeforeData;
+ static int use_schemaAfterData;

static struct option long_options[] = {
{"data-only", no_argument, NULL, 'a'},
***************

This hunk appears to have a bit of gratuitous whitespace change, not a
big deal tho.

***************
*** 464,474 ****
[...]
+ if (dataOnly)
+ dumpObjFlags = REQ_DATA;
+
+ if (use_schemaBeforeData == 1)
+ dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+
+ if (use_schemaAfterData == 1)
+ dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+
+ if (schemaOnly)
+ dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
***************

It wouldn't kill to be consistant between testing for '== 1' and just
checking for non-zero. Again, not really a big deal, and I wouldn't
mention these if there weren't other issues.

***************
*** 646,652 ****
* Dumping blobs is now default unless we saw an inclusion switch or -s
* ... but even if we did see one of these, -b turns it back on.
*/
! if (include_everything && !schemaOnly)
outputBlobs = true;

/*
--- 689,695 ----
* Dumping blobs is now default unless we saw an inclusion switch or -s
* ... but even if we did see one of these, -b turns it back on.
*/
! if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
outputBlobs = true;

/*
***************

Shouldn't this change be to "WANT_DATA(dumpObjFlags)"? That's what most
of the '!schemaOnly' get translated to. Otherwise I think you would be
getting blobs when you've asked for just schema-before-data, which
doesn't seem like it'd make much sense.

***************
*** 712,718 ****
dumpStdStrings(g_fout);

/* The database item is always next, unless we don't want it at all */
! if (include_everything && !dataOnly)
dumpDatabase(g_fout);

/* Now the rearrangeable objects. */
--- 755,761 ----
dumpStdStrings(g_fout);

/* The database item is always next, unless we don't want it at all */
! if (include_everything && WANT_DATA(dumpObjFlags))
dumpDatabase(g_fout);

/* Now the rearrangeable objects. */
***************

Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

***************
*** 3414,3420 ****
continue;

/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump)
continue;

if (g_verbose)
--- 3459,3465 ----
continue;

/* Ignore indexes of tables not to be dumped */
! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
continue;

if (g_verbose)
***************

I didn't test this, but it strikes me as an unnecessary addition? If
anything, wouldn't this check make more sense being done right after
dropping into getIndexes()? No sense going through the loop just for
fun.. Technically, it's a behavioral change for --data-only since it
used to gather index information anyway, but it's a good optimization if
done in the right place.

Also around here, there doesn't appear to be any checking in
dumpEnumType(), which strikes me as odd. Wouldn't that deserve a

if (!WANT_PRE_SCHEMA(dumpObjFlags))
return;

check? If not even some kind of equivilant ->dobj.dump check..

***************
*** 9803,9809 ****
tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
}

! if (!schemaOnly)
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
--- 9848,9854 ----
tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
}

! if (WANT_PRE_SCHEMA(dumpObjFlags))
{
resetPQExpBuffer(query);
appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
***************

This is a mistaken logic invert, which results in setval's not being
dumped at all when pulling out each piece seperately. It should be:

if (WANT_DATA(dumpObjFlags))

so that setval's are correctly included on the --data-only piece. As
--data-only previously existed, this would be a regression too.

Index: src/bin/pg_dump/pg_restore.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
retrieving revision 1.88
diff -c -r1.88 pg_restore.c
*** src/bin/pg_dump/pg_restore.c 13 Apr 2008 03:49:22 -0000 1.88
--- src/bin/pg_dump/pg_restore.c 23 Jul 2008 17:06:59 -0000
+ if (dataOnly)
+ dumpObjFlags = REQ_DATA;
+
+ if (use_schemaBeforeData == 1)
+ dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
+
+ if (use_schemaAfterData == 1)
+ dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
+
+ if (schemaOnly)
+ dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
+
***************

Ditto previous comment on this, but in pg_restore.c.

***************
*** 405,410 ****
--- 455,462 ----
" do not restore data of tables that could not be\n"
" created\n"));
printf(_(" --no-tablespaces do not dump tablespace assignments\n"));
+ printf(_(" --schema-before-data dump only the part of schema before table data\n"));
+ printf(_(" --schema-after-data dump only the part of schema after table data\n"));
printf(_(" --use-set-session-authorization\n"
" use SESSION AUTHORIZATION commands instead of\n"
" OWNER TO commands\n"));
***************

Forgot to mention this on pg_dump.c, but in both pg_dump and pg_restore,
and I hate to be the bearer of bad news, but your command-line
documentation doesn't line up properly in the output. You shouldn't be
using tabs there but instead should use spaces as the other help text
does, so everything lines up nicely.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2008-07-24 04:55:37 Re: [PATCHES] WITH RECUSIVE patches 0723
Previous Message Andrew Dunstan 2008-07-24 03:16:18 Re: issues/experience with building postgres on Windows

Browse pgsql-patches by date

  From Date Subject
Next Message Tatsuo Ishii 2008-07-24 04:55:37 Re: [PATCHES] WITH RECUSIVE patches 0723
Previous Message Tatsuo Ishii 2008-07-24 01:19:05 Re: [PATCHES] WITH RECUSIVE patches 0723