Re: pg_dump additional options for performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump additional options for performance
Date: 2008-02-26 19:55:29
Message-ID: 1204055729.4252.437.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2008-02-26 at 13:18 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I've not been advocating improving pg_restore, which is where the -Fc
> > tricks come in.
> > ...
> > I see you thought I meant pg_restore. I don't thinking extending
> > pg_restore in that way is of sufficiently generic use to make it
> > worthwhile. Extending psql would be worth it, since not all psql scripts
> > come from pg_dump.
>
> OK, the reason I didn't grasp what you are proposing is that it's insane.
>
> We can easily, and backwards-compatibly, improve pg_restore to do
> concurrent restores. Trying to make psql do something like this will
> require a complete rewrite, and there is no prospect that it will work
> for any input that didn't come from (an updated version of) pg_dump
> anyway. Furthermore you will have to write a whole bunch of new code
> just to duplicate what pg_dump/pg_restore already do, ie store/retrieve
> the TOC and dependency info in a program-readable fashion.
>
> Since the performance advantages are still somewhat hypothetical,
> I think we should reach for the low-hanging fruit first. If concurrent
> pg_restore really does prove to be the best thing since sliced bread,
> *then* would be the time to start thinking about whether it's possible
> to do the same thing in less-constrained scenarios.

Working late isn't very helpful if this type of comment is all it
delivers.

I've said that the low hanging fruit is split files from pg_dump, hence
the title of this thread.

I've said I'm *not* planning to work on concurrent pg_restore or
concurrent psql and I don't regard either of them as low hanging fruit.
But concurrent pg_restore is a big waste of time, however low it hangs.
If you have a performance problem on load, putting all your data into a
single huge file is not the right starting point to solve your load
problem and as Greg points out data files from other sources aren't ever
in pg_dump format.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-26 19:55:34 Re: code cleanup of timestamp code
Previous Message Simon Riggs 2008-02-26 19:50:35 Re: pg_dump additional options for performance