patch for parallel pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: patch for parallel pg_dump
Date: 2012-01-15 18:01:43
Message-ID: CACw0+11NeCcdv2dp4OsEYRoG8rEhKuVui7A8HwbOkBWCSygPXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So this is the parallel pg_dump patch, generalizing the existing
parallel restore and allowing parallel dumps for the directory archive
format, the patch works on Windows and Unix.

In the first phase of a parallel pg_dump/pg_restore, it does catalog
backup/restore in a single process, then forks off worker processes
which are connected to the master process by pipes (on Windows, the
pg_pipe implementation is used). These pipes are only used for a few
commands and status messages. The processes then work on the items
that they get assigned to by the master, in other words the worker
processes do not terminate after each item but stay there until the
end of the parallel part of the dump/restore. Once they finish their
current item and send the status back to the master, they are assigned
the next item and so forth...

In parallel restore, the master closes its own connection to the
database before forking of worker processes, just as it does now. In
parallel dump however, we need to hold the masters connection open so
that we can detect deadlocks. The issue is that somebody could have
requested an exclusive lock after the master has initially requested a
shared lock on all tables. Therefore, the worker process also requests
a shared lock on the table with NOWAIT and if this fails, we know that
there is a conflicting lock in between and that we need to abort the
dump.

Parallel pg_dump sorts the tables and indexes by their sizes so that
it can start with the largest items first.

The connections of the parallel dump use the synchronized snapshot
feature. However there's also an option --no-synchronized-snapshots
which can be used to dump from an older PostgreSQL version.

I'm also attaching another use-case for the parallel backup as a
separate patch, which is a new archive format that I named
"pg_backup_mirror", it's basically the parallel version of "pg_dump |
psql", so it does a parallel dump and restore of a database from one
host to another. The patch for this is fairly small, but it's still a
bit rough and needs some more work and discussion. Depending on how
quickly (or not) we get done with the review of the main patch, we can
then include this one as well or postpone it.

Joachim

Attachment Content-Type Size
parallel_pg_dump_1.diff text/x-patch 187.5 KB
pg_backup_mirror_1.diff text/x-patch 26.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-01-15 19:52:01 Re: libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved
Previous Message Martin Pihlak 2012-01-15 17:36:53 Re: 9.2 Reviewfest - logging hooks patch