Re: pg_dump --split patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joel Jacobson <joel(at)gluefinance(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump --split patch
Date: 2010-12-28 17:33:06
Message-ID: 6866.1293557586@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joel Jacobson <joel(at)gluefinance(dot)com> writes:
> 2010/12/28 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> That has at least as many failure modes as the other representation.

> I don't follow, what do you mean with "failure modes"? The oid in the
> filename? I suggested to use a sequence instead but you didn't comment on
> that. Are there any other failure modes which could cause a diff -r between
> two different databases to break?

AFAIK the primary failure modes for diff'ing text dumps are

(1) randomly different ordering of objects from one dump to another.
Your initial proposal would avoid that problem as long as the object
OIDs didn't change, but since it falls down completely across a dump and
reload, or delete and recreate, I can't really see that it's a step
forward. Using a sequence number generated by pg_dump doesn't change
this at all --- the sequence would be just as unpredictable.

(2) randomly different ordering of rows within a table. Your patch
didn't address that, unless I misunderstood quite a bit.

I think the correct fix for (1) is to improve pg_dump's method for
sorting objects. It's not that bad now, but it does have issues with
random ordering of similarly-named objects. IIRC Peter Eisentraut
proposed something for this last winter but it seemed a mite too ugly,
and he got beaten down to just this:

commit 1acc06a1f4ae752793d2199d8d462a6708c8acc2
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: Mon Feb 15 19:59:47 2010 +0000

When sorting functions in pg_dump, break ties (same name) by number of argum
ents

Maybe you can do better, but I'd suggest going back to reread the
discussion that preceded that patch.

> (This might be a bad idea for some other reason, but I noticed a few other
> users requesting the same feature when I googled "pg_dump split".)

AFAIR what those folk really wanted was a selective dump with more
selectivity knobs than exist now. I don't think their lives would be
improved by having to root through a twisty little maze of numbered
files to find the object they wanted.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2010-12-28 17:33:36 Re: pg_dump --split patch
Previous Message David Fetter 2010-12-28 17:31:05 Re: "writable CTEs"