Re: pg_dump --split patch

From: Marko Tiikkaja <pgmail(at)joh(dot)to>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Joel Jacobson <joel(at)trustly(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump --split patch
Date: 2012-11-18 23:05:26
Message-ID: 50A969B6.20906@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 16/11/2012 15:52, Dimitri Fontaine wrote:
> Marko Tiikkaja <pgmail(at)joh(dot)to> writes:
>> The general output scheme looks like this:
>> schemaname/OBJECT_TYPES/object_name.sql,
>
> I like this feature, I actually did have to code it myself in the past
> and several other people did so, so we already have at least 3 copies of
> `getddl` variants around. I really think this feature should be shipped
> by default with PostgreSQL.
>
> I don't much care for the all uppercase formating of object type
> directories in your patch though.

*shrug* I have no real preference to one way or the other.

>> Overloaded functions are dumped into the same file. Object names are
>> encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by
>> replacing any characters outside that set with an underscore.
>
> What happens if you have a table foo and another table "FoO"?

They would go to the same file. If you think there are technical issues
behind that decision (e.g. the dump would not restore), I would like to
hear an example case.

On the other hand, some people might find it preferrable to have them in
different files (for example foo, foo.1, foo.2 etc). Or some might
prefer some other naming scheme. One of the problems with this patch is
exactly that people prefer different things, and providing switches for
all of the different options people come up with would mean a lot of
switches. :-(

>> Restoring the dump is supported through an index.sql file containing
>> statements which include (through \i) the actual object files in the dump
>> directory.
>
> I think we should be using \ir now that we have that.

Good point, will have to get that fixed.

>> Any thoughts? Objections on the idea or the implementation?
>
> As far as the implementation goes, someone with more experience on the
> Archiver Handles should have a look. To me, it looks like you are trying
> to shoehorn your feature in the current API and that doesn't feel good.

It feels a bit icky to me too, but I didn't feel comfortable with
putting in a lot of work to refactor the API because of how
controversial this feature is.

> The holly grail here that we've been speaking about in the past would be
> to separate out tooling and formats so that we have:
>
> pg_dump | pg_restore
> pg_export | psql
>
> In that case we would almost certainly need libpgdump to share the code,
> and we maybe could implement a binary output option for pg_dump too
> (yeah, last time it was proposed we ended up with bytea_output = 'hex').

While I agree that this idea - when implemented - would be nicer in
practically every way, I'm not sure I want to volunteer to do all the
necessary work.

> That libpgdump idea basically means we won't have the --split feature in
> 9.3, and that's really bad, as we already are some releases late on
> delivering that, in my opinion.
>
> Maybe the pg_export and pg_dump tool could share code by just #include
> magic rather than a full blown lib in a first incantation?

That's one idea..

Regards,
Marko Tiikkaja

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-11-18 23:13:21 Re: autovacuum stress-testing our system
Previous Message Tomas Vondra 2012-11-18 22:49:19 Re: autovacuum stress-testing our system