Re: pg_dump --split patch

From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Joel Jacobson <joel(at)gluefinance(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_dump --split patch
Date: 2010-12-28 20:44:54
Message-ID: AANLkTi=M0xquuPONK3qjbnw0jtDzP3VL-OyXe6_Ohkz6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel(at)gluefinance(dot)com> wrote:

> 2010/12/28 Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
>
>> I would suggest the directory structure as:
>>
>> /crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
>> /crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
>> ...
>> /crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
>> /crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sql
>>
>> This might n be more amenable to diff'ing the different dumps. Schemas are
>> logical grouping of other objects and hence making that apparent in your
>> dump's hierarchy makes more sense.
>>
>
> Thanks Gurjeet and Tom for good feedback!
>
> I've made some changes and attached new patches.
> Looks much better now I think!
>
> This is what I've changed,
>
> *) Not using oid anymore in the filename
> *) New filename/path structure: [-f
> filename]-split/[schema]/[desc]/[tag].sql
> *) If two objects share the same name tag for the same [schema]/[desc], -2,
> -3, etc is appended to the name. Example:
> ~/pg.dump-split/public/FUNCTION/foobar.sql
> ~/pg.dump-split/public/FUNCTION/foobar-2.sql
> ~/pg.dump-split/public/FUNCTION/barfoo.sql
> ~/pg.dump-split/public/FUNCTION/barfoo-2.sql
> ~/pg.dump-split/public/FUNCTION/barfoo-3.sql
>
> I think you are right about functions (and aggregates) being the only
> desc-type where two objects can share the same name in the same schema.
> This means the problem of dumping objects in different order is a very
> limited problem, only affecting overloaded functions.
>
> I didn't include the arguments in the file name, as it would lead to very
> long file names unless truncated, and since the problem is very limited, I
> think we shouldn't include it. It's cleaner with just the name part of the
> tag in the file name.
>
>
I haven't seen your code yet, but we need to make sure that in case of name
collision we emit the object definitions in a sorted order so that the dump
is always deterministic: func1(char) should be _always_ dumped before
func1(int), that is, output file names are always deterministic.

The problem I see with suffixing a sequence id to the objects with name
collision is that one day the dump may name myfunc(int) as myfunc.sql and
after an overloaded version is created, say myfunc(char, int), then the same
myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.

Also, it is a project policy that we do not introduce new features in back
branches, so spending time on an 8.4.6 patch may not be the best use of your
time.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-12-28 21:41:20 9.1alpha3 bundled -- please verify
Previous Message Tom Lane 2010-12-28 20:18:43 Re: the number of file descriptors when using POSIX semaphore