Re: pg_dump --split patch

From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Joel Jacobson <joel(at)gluefinance(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump --split patch
Date: 2010-12-28 17:33:36
Message-ID: AANLkTinLhsPWkcVcc9eaGbX_ZHnhk54V=nNdYxGxY9Zb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 28, 2010 at 11:00 AM, Joel Jacobson <joel(at)gluefinance(dot)com>wrote:

> Dear fellow hackers,
>
> Problem: A normal diff of two slightly different schema dump files (pg_dump
> -s), will not produce a user-friendly diff, as you get all changes in the
> same file.
>
> Solution: I propose a new option to pg_dump, --split, which dumps each
> object to a separate file in a user friendly directory structure:
>
> [-f filename] : main dump file, imports each splitted part using \i
> [-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oid
>
> Example: If the filename (-f) is "pg.dump", the following directory
> structure would be created:
>
> $ ./pg_dump -f /crypt/pg.dump --split -F p -s glue
> /crypt/pg.dump-split/VIEW/
> /crypt/pg.dump-split/TYPE/
> /crypt/pg.dump-split/TRIGGER/
> /crypt/pg.dump-split/TABLE/
> /crypt/pg.dump-split/SEQUENCE/
> /crypt/pg.dump-split/SCHEMA/
> /crypt/pg.dump-split/PROCEDURAL_LANGUAGE/
> /crypt/pg.dump-split/INDEX/
> /crypt/pg.dump-split/FUNCTION/
> /crypt/pg.dump-split/FK_CONSTRAINT/
> /crypt/pg.dump-split/CONSTRAINT/
> /crypt/pg.dump-split/AGGREGATE/
>
> In each such directory, one directory per object name is created.
> If we would have a function "foobar" with oid "12345" it would be saved to:
> /crypt/pg.dump-split/FUNCTION/foobar/12345.sql
>
> In the "pg.dump" plain text file, the files are "linked in" using the "\i"
> psql command, e.g.:
> \i /crypt/pg.dump-split/FUNCTION/foobar/12345.sql
>
> Potential use-case scenarios:
>
> *) Version control your database schema, by exporting it daily (using
> --split) and commiting the differences.
>
> *) Compare differences of schema dumps created in different points in time.
> Since objects are stored in separate files, it is easier to see what areas
> were modified, compared to looking at the diff of two entire schemas.
>
> *) Restore only some objects, based on type (e.g., only the functions) or
> name (e.g. only fucntions of certain name/names).
>
> I've tested the patch for both the latest HEAD (9.1devel) as well as 8.4.6.
>
>
I would suggest the directory structure as:

/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sql

This might n be more amenable to diff'ing the different dumps. Schemas are
logical grouping of other objects and hence making that apparent in your
dump's hierarchy makes more sense.

Most importantly, as Tom suggested, don't use or rely on OIDs. I think
function overloading is the only case where you can have more than one
object with the same name under a schema. That can be resolved if you
included function signature in filename:

/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int-char.sql
/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int-int.sql
/crypt/pg.dump-split/emp/FUNCTIONS/myfunc-int.sql

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Azat Khuzhin 2010-12-28 17:53:26 Re: Bug in configure script for build postgresql
Previous Message Tom Lane 2010-12-28 17:33:06 Re: pg_dump --split patch