Skip site navigation (1) Skip section navigation (2)

WIP patch for parallel pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP patch for parallel pg_dump
Date: 2010-11-14 23:52:55
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
This is the second patch for parallel pg_dump, now the actual part that
parallelizes the whole thing. More precisely, it adds parallel
to pg_dump/pg_restore for the directory archive format and keeps the
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run

$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname

to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)

The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of
objects to restore is determined by the dependencies among the objects
is already used in the parallel restore of the custom archivetype).

The file includes some example commands that I have run here as a
of regression test that should give you an impression of how to call it from
command line.

One thing that is currently missing is proper support for Windows, this is
the next
thing that I will be working on. Also this version still gives quite a bunch
of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably
not work...

The missing part that would make parallel pg_dump work with no strings
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or
upgrades without pg_migrator.


Attachment: pg_dump-parallel.diff
Description: text/x-patch (132.4 KB)


pgsql-hackers by date

Next:From: Greg SmithDate: 2010-11-14 23:59:03
Subject: Re: pg_stat_bgwriter broken?
Previous:From: Greg SmithDate: 2010-11-14 23:48:24
Subject: Spread checkpoint sync

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group