Re: pg_dump and dependencies and --section ... it's a mess

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: pg_dump and dependencies and --section ... it's a mess
Date: 2012-06-22 20:43:17
Message-ID: 18836.1340397797@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I believe the right fix for both of these issues is to add knowledge of
> the section concept to the topological sort logic, so that an ordering
> that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
> be a dependency-ordering violation. One way to do that is to add dummy
> "fencepost" objects to the sort, representing the start and end of the
> DATA section. However, these objects would need explicit dependency
> links to every other DumpableObject, so that doesn't sound very good
> from a performance standpoint. What I'm going to go look at is whether
> we can mark DumpableObjects with their SECTION codes at creation time
> (rather than adding that information at ArchiveEntry() time) and then
> have the topo sort logic take that marking into account in addition to
> the explicit dependency links.

I gave up on putting any fancy hacks into the topological sort code;
it would have made it extremely ugly, and besides every solution I could
think of wanted to have at least one extra auxiliary array with an entry
per DumpableObject. Which would have eaten about as much space as the
extra dependency links. Furthermore, it turns out that the post-data
dependencies need to be changeable, since rules and constraints should
only be forced to be post-data *if* we've decided to dump them
separately from their parent tables/views. (My original try at this
ended up forcing every rule & constraint to be dumped separately, which
is not what we want.) Putting knowledge of that into the core
topological sort code seemed right out. So the attached draft patch
does it the straightforward way, actually creating two dummy boundary
objects and setting up explicit dependency links with them.

I did some simple performance tests and found that this adds a
measurable but pretty negligible cost to pg_dump's runtime. For
instance, dumping several thousand empty tables went from 9.34 to
9.57 seconds of pg_dump CPU time, compared to multiple minutes of
CPU time spent on the backend side (even with the recent lockmanager
fixes). So I no longer feel any strong need to "optimize" the code.

A disadvantage of representing the dependencies explicitly is that
the ones attached to DATA and POST_DATA objects show up in the output
archive. I'm not particularly worried about this so far as HEAD and
9.2 are concerned, because the other patch to fix emitted dependencies
will make them go away again. But as I mentioned, I'm not big on
back-patching that one into 9.1. We could hack something simpler
to directly suppress dependencies on the boundary objects only, or
we could just write it off as not mattering much. I'd barely have
noticed it except I was testing whether I got an exact match to the
archive produced by an unpatched pg_dump (in cases not involving
the view-vs-constraint bug).

Anyway, the attached patch does seem to fix the constraint bug.

A possible objection to it is that there are now three different ways in
which the pg_dump code knows which DO_XXX object types go in which dump
section: the new addBoundaryDependencies() function knows this, the
SECTION_xxx arguments to ArchiveEntry calls know it, and the sort
ordering constants in pg_dump_sort.c have to agree too. My original
idea was to add an explicit section field to DumpableObject to reduce
the number of places that know this, but that would increase pg_dump's
memory consumption still more, and yet still not give us a single point
of knowledge. Has anybody got a better idea?

Barring objections or better ideas, I'll push forward with applying
this patch and the dependency-fixup patch.

regards, tom lane

Attachment Content-Type Size
enforce-section-boundaries-1.patch text/x-patch 16.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-06-22 21:02:42 Re: random failing builds on spoonbill - backends not exiting...
Previous Message Stefan Kaltenbrunner 2012-06-22 20:42:02 Re: random failing builds on spoonbill - backends not exiting...