Re: Parallel pg_dump for 9.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel pg_dump for 9.1
Date: 2010-03-29 15:40:26
Message-ID: 603c8f071003290840wa8b25dfr8eecfdd4e81fd16c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 29, 2010 at 10:46 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> - There are ideas on how to solve the issue with the consistent
> snapshot but in the end you can always solve it by stopping your
> application(s). I actually assume that whenever people are interested
> in a very fast dump, it is because they are doing some maintenance
> task (like migrating to a different server) that involves pg_dump. In
> these cases, they would stop their system anyway.
> Even if we had consistent snapshots in a future version, would we
> forbid people to run parallel dumps against old server versions? What
> I suggest is to just display a big warning if run against a server
> without consistent snapshot support (which currently is every
> version).

Seems reasonable.

> - Regarding the output of pg_dump I am proposing two solutions. The
> first one is to introduce a new archive type "directory" where each
> table and each blob is a file in a directory, similar to the
> experimental "files" archive type. Also the idea has come up that you
> should be able to specify multiple directories in order to make use of
> several physical disk drives. Thinking this further, in order to
> manage all the mess that you can create with this, every file of the
> same backup needs to have a unique identifier and pg_restore should
> have a check parameter that tells you if your backup directory is in a
> sane and complete state (think about moving a file from one backup
> directory to another one or trying to restore from two directories
> which are from different backup sets...).

I think that specifying several directories is a piece of complexity
that would be best left alone for a first version of this. But a
single directory with multiple files sounds pretty reasonable. Of
course we'll also need to support that format in non-parallel mode,
and in pg_restore.

> The second solution to the single-file-problem is to generate no
> output at all, i.e. whatever you export from your source database you
> import directly into your target database, which in the end turns out
> to be a parallel form of "pg_dump | psql".

This is a very interesting idea but you might want to get the other
thing merged first, as it's going to present a different set of
issues.

> I am currently not planning to make parallel dumps work with the
> custom format even though this would be possible if we changed the
> format to a certain degree.

I'm thinking we probably don't want to change the existing formats.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2010-03-29 15:42:33 Re: enable_joinremoval
Previous Message Robert Haas 2010-03-29 15:32:04 Re: enable_joinremoval