Skip site navigation (1) Skip section navigation (2)

patch for parallel pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: patch for parallel pg_dump
Date: 2012-01-15 18:01:43
Message-ID: CACw0+11NeCcdv2dp4OsEYRoG8rEhKuVui7A8HwbOkBWCSygPXQ@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
So this is the parallel pg_dump patch, generalizing the existing
parallel restore and allowing parallel dumps for the directory archive
format, the patch works on Windows and Unix.

In the first phase of a parallel pg_dump/pg_restore, it does catalog
backup/restore in a single process, then forks off worker processes
which are connected to the master process by pipes (on Windows, the
pg_pipe implementation is used). These pipes are only used for a few
commands and status messages. The processes then work on the items
that they get assigned to by the master, in other words the worker
processes do not terminate after each item but stay there until the
end of the parallel part of the dump/restore. Once they finish their
current item and send the status back to the master, they are assigned
the next item and so forth...

In parallel restore, the master closes its own connection to the
database before forking of worker processes, just as it does now. In
parallel dump however, we need to hold the masters connection open so
that we can detect deadlocks. The issue is that somebody could have
requested an exclusive lock after the master has initially requested a
shared lock on all tables. Therefore, the worker process also requests
a shared lock on the table with NOWAIT and if this fails, we know that
there is a conflicting lock in between and that we need to abort the
dump.

Parallel pg_dump sorts the tables and indexes by their sizes so that
it can start with the largest items first.

The connections of the parallel dump use the synchronized snapshot
feature. However there's also an option --no-synchronized-snapshots
which can be used to dump from an older PostgreSQL version.

I'm also attaching another use-case for the parallel backup as a
separate patch, which is a new archive format that I named
"pg_backup_mirror", it's basically the parallel version of "pg_dump |
psql", so it does a parallel dump and restore of a database from one
host to another. The patch for this is fairly small, but it's still a
bit rough and needs some more work and discussion. Depending on how
quickly (or not) we get done with the review of the main patch, we can
then include this one as well or postpone it.


Joachim

Attachment: pg_backup_mirror_1.diff
Description: text/x-patch (26.1 KB)
Attachment: parallel_pg_dump_1.diff
Description: text/x-patch (187.5 KB)

Responses

pgsql-hackers by date

Next:From: Peter EisentrautDate: 2012-01-15 19:52:01
Subject: Re: libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved
Previous:From: Martin PihlakDate: 2012-01-15 17:36:53
Subject: Re: 9.2 Reviewfest - logging hooks patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group