Re: Parallel pg_dump for 9.1

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel pg_dump for 9.1
Date: 2010-03-29 17:16:04
Message-ID: 4BB0E054.8090406@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Mon, Mar 29, 2010 at 10:46 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
[...]
>> - Regarding the output of pg_dump I am proposing two solutions. The
>> first one is to introduce a new archive type "directory" where each
>> table and each blob is a file in a directory, similar to the
>> experimental "files" archive type. Also the idea has come up that you
>> should be able to specify multiple directories in order to make use of
>> several physical disk drives. Thinking this further, in order to
>> manage all the mess that you can create with this, every file of the
>> same backup needs to have a unique identifier and pg_restore should
>> have a check parameter that tells you if your backup directory is in a
>> sane and complete state (think about moving a file from one backup
>> directory to another one or trying to restore from two directories
>> which are from different backup sets...).
>
> I think that specifying several directories is a piece of complexity
> that would be best left alone for a first version of this. But a
> single directory with multiple files sounds pretty reasonable. Of
> course we'll also need to support that format in non-parallel mode,
> and in pg_restore.
>
>> The second solution to the single-file-problem is to generate no
>> output at all, i.e. whatever you export from your source database you
>> import directly into your target database, which in the end turns out
>> to be a parallel form of "pg_dump | psql".
>
> This is a very interesting idea but you might want to get the other
> thing merged first, as it's going to present a different set of
> issues.

I had some prior discussion with joachim (and I suspect I had some
influence in him trying to implement that) on that.
The reason why this is really needed is that the current pg_restore -j
is actually a net loss(vs "pg_dump | psql") in a LOT of scenarios that
are basically "duplicate this database to that location" (or any
migration really).
The example at had is a 240GB production database with around 850
tables, it takes ~145min to dump that database single
threaded(completely CPU bound), simply loading the SQL using psql can
restore it in ~150min(again CPU bound both for COPY and index creation),
-j8 brings that down to ~55min.
So if you do the math(and a bit of handwaving):

* using pg_dump | psql you get greatest(140,150) -> 150min.
* using pg_dump -Z0 -Fc && pg_restore -j8 you get 145+55 -> 200min
* using a theoretical parallel pg_dump and the existing parallel restore
you would get: 50(just a guess for how fast it might be) + 55 -> 105min
* a parallel dump & restore that can pipline would end up at
greatest(50,55)->55min

So a parallel dump alone would only give you a 50% speedup in total time
for doing a migration/upgrade/dump-to-devbox despite the fact that it
uses 8x the resources. A piplined solution would result in a ~3x speedup
in total time and you don't even have to even think about stuff that
might be a problem like having available diskspace on the
source/destination to hold a full temporary dump(if you don't you might
even have to add some transfer time as well).

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-03-29 17:23:13 Re: enable_joinremoval
Previous Message Łukasz Dejneka 2010-03-29 17:05:44 Re: Using HStore type in TSearch