Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-07-24 07:37:11
Message-ID: 1216885031.3894.739.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Wed, 2008-07-23 at 23:20 -0400, Stephen Frost wrote:

> * Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> > ...and with command line help also.
>
> The documentation and whatnot looks good to me now. There are a couple
> of other issues I found while looking through and testing the patch
> though-

Thanks for a good review.

> Index: src/bin/pg_dump/pg_dump.c
> ===================================================================
> RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_dump.c,v
> retrieving revision 1.497
> diff -c -r1.497 pg_dump.c
> *** src/bin/pg_dump/pg_dump.c 20 Jul 2008 18:43:30 -0000 1.497
> --- src/bin/pg_dump/pg_dump.c 23 Jul 2008 17:04:24 -0000
> ***************
> *** 225,232 ****
> RestoreOptions *ropt;
>
> static int disable_triggers = 0;
> ! static int outputNoTablespaces = 0;
> static int use_setsessauth = 0;
>
> static struct option long_options[] = {
> {"data-only", no_argument, NULL, 'a'},
> --- 229,238 ----
> RestoreOptions *ropt;
>
> static int disable_triggers = 0;
> ! static int outputNoTablespaces = 0;
> static int use_setsessauth = 0;
> + static int use_schemaBeforeData;
> + static int use_schemaAfterData;
>
> static struct option long_options[] = {
> {"data-only", no_argument, NULL, 'a'},
> ***************
>
> This hunk appears to have a bit of gratuitous whitespace change, not a
> big deal tho.

OK

> ***************
> *** 464,474 ****
> [...]
> + if (dataOnly)
> + dumpObjFlags = REQ_DATA;
> +
> + if (use_schemaBeforeData == 1)
> + dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
> +
> + if (use_schemaAfterData == 1)
> + dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
> +
> + if (schemaOnly)
> + dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
> ***************
>
> It wouldn't kill to be consistant between testing for '== 1' and just
> checking for non-zero. Again, not really a big deal, and I wouldn't
> mention these if there weren't other issues.

OK

> ***************
> *** 646,652 ****
> * Dumping blobs is now default unless we saw an inclusion switch or -s
> * ... but even if we did see one of these, -b turns it back on.
> */
> ! if (include_everything && !schemaOnly)
> outputBlobs = true;
>
> /*
> --- 689,695 ----
> * Dumping blobs is now default unless we saw an inclusion switch or -s
> * ... but even if we did see one of these, -b turns it back on.
> */
> ! if (include_everything && WANT_PRE_SCHEMA(dumpObjFlags))
> outputBlobs = true;
>
> /*
> ***************
>
> Shouldn't this change be to "WANT_DATA(dumpObjFlags)"? That's what most
> of the '!schemaOnly' get translated to. Otherwise I think you would be
> getting blobs when you've asked for just schema-before-data, which
> doesn't seem like it'd make much sense.

Yes, fixed

> ***************
> *** 712,718 ****
> dumpStdStrings(g_fout);
>
> /* The database item is always next, unless we don't want it at all */
> ! if (include_everything && !dataOnly)
> dumpDatabase(g_fout);
>
> /* Now the rearrangeable objects. */
> --- 755,761 ----
> dumpStdStrings(g_fout);
>
> /* The database item is always next, unless we don't want it at all */
> ! if (include_everything && WANT_DATA(dumpObjFlags))
> dumpDatabase(g_fout);
>
> /* Now the rearrangeable objects. */
> ***************
>
> Shouldn't this be 'WANT_PRE_SCHEMA(dumpObjFlags)'?

Yes, fixed

> ***************
> *** 3414,3420 ****
> continue;
>
> /* Ignore indexes of tables not to be dumped */
> ! if (!tbinfo->dobj.dump)
> continue;
>
> if (g_verbose)
> --- 3459,3465 ----
> continue;
>
> /* Ignore indexes of tables not to be dumped */
> ! if (!tbinfo->dobj.dump || !WANT_POST_SCHEMA(dumpObjFlags))
> continue;
>
> if (g_verbose)
> ***************
>
> I didn't test this, but it strikes me as an unnecessary addition? If
> anything, wouldn't this check make more sense being done right after
> dropping into getIndexes()? No sense going through the loop just for
> fun.. Technically, it's a behavioral change for --data-only since it
> used to gather index information anyway, but it's a good optimization if
> done in the right place.

Agreed. I've just removed this. Patch not about optimising logic.

> Also around here, there doesn't appear to be any checking in
> dumpEnumType(), which strikes me as odd. Wouldn't that deserve a
>
> if (!WANT_PRE_SCHEMA(dumpObjFlags))
> return;
>
> check? If not even some kind of equivilant ->dobj.dump check..

Agreed. That appears to be a bug in pg_dump, since this would currently
dump enums if --data-only was specified, which in my understanding would
be wrong.

Have included this:

/* Skip if not to be dumped */
if (!tinfo->dobj.dump || !WANT_BEFORE_SCHEMA(dumpObjFlags))
return;

> ***************
> *** 9803,9809 ****
> tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
> }
>
> ! if (!schemaOnly)
> {
> resetPQExpBuffer(query);
> appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
> --- 9848,9854 ----
> tbinfo->dobj.catId, 0, tbinfo->dobj.dumpId);
> }
>
> ! if (WANT_PRE_SCHEMA(dumpObjFlags))
> {
> resetPQExpBuffer(query);
> appendPQExpBuffer(query, "SELECT pg_catalog.setval(");
> ***************
>
> This is a mistaken logic invert, which results in setval's not being
> dumped at all when pulling out each piece seperately. It should be:
>
> if (WANT_DATA(dumpObjFlags))
>
> so that setval's are correctly included on the --data-only piece. As
> --data-only previously existed, this would be a regression too.

OK, fixed.

> Index: src/bin/pg_dump/pg_restore.c
> ===================================================================
> RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/bin/pg_dump/pg_restore.c,v
> retrieving revision 1.88
> diff -c -r1.88 pg_restore.c
> *** src/bin/pg_dump/pg_restore.c 13 Apr 2008 03:49:22 -0000 1.88
> --- src/bin/pg_dump/pg_restore.c 23 Jul 2008 17:06:59 -0000
> + if (dataOnly)
> + dumpObjFlags = REQ_DATA;
> +
> + if (use_schemaBeforeData == 1)
> + dumpObjFlags = REQ_SCHEMA_BEFORE_DATA;
> +
> + if (use_schemaAfterData == 1)
> + dumpObjFlags = REQ_SCHEMA_AFTER_DATA;
> +
> + if (schemaOnly)
> + dumpObjFlags = (REQ_SCHEMA_BEFORE_DATA | REQ_SCHEMA_AFTER_DATA);
> +
> ***************
>
> Ditto previous comment on this, but in pg_restore.c.

OK

>
> ***************
> *** 405,410 ****
> --- 455,462 ----
> " do not restore data of tables that could not be\n"
> " created\n"));
> printf(_(" --no-tablespaces do not dump tablespace assignments\n"));
> + printf(_(" --schema-before-data dump only the part of schema before table data\n"));
> + printf(_(" --schema-after-data dump only the part of schema after table data\n"));
> printf(_(" --use-set-session-authorization\n"
> " use SESSION AUTHORIZATION commands instead of\n"
> " OWNER TO commands\n"));
> ***************
>
> Forgot to mention this on pg_dump.c, but in both pg_dump and pg_restore,
> and I hate to be the bearer of bad news, but your command-line
> documentation doesn't line up properly in the output. You shouldn't be
> using tabs there but instead should use spaces as the other help text
> does, so everything lines up nicely.

OK

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
pg_dump_beforeafter.v6.patch text/x-patch 68.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2008-07-24 07:43:25 Re: Do we really want to migrate plproxy and citext into PG core distribution?
Previous Message Martijn van Oosterhout 2008-07-24 06:32:35 Re: [WIP] collation support revisited (phase 1)

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-07-24 07:54:01 Re: pg_dump additional options for performance
Previous Message Tatsuo Ishii 2008-07-24 04:55:37 Re: [PATCHES] WITH RECUSIVE patches 0723