Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-01-27 16:46:10
Message-ID: CA+Tgmoa_F0+QMbj31rMMdqBa2bqf_nXN=i=6rXomuD1F53BSoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> So this is the parallel pg_dump patch, generalizing the existing
> parallel restore and allowing parallel dumps for the directory archive
> format, the patch works on Windows and Unix.

It seems a little unfortunate that we are using threads here on
Windows and processes on Linux. Maybe it's too late to revisit that
decision, but it seems like a recipe for subtle bugs.

> In parallel restore, the master closes its own connection to the
> database before forking of worker processes, just as it does now. In
> parallel dump however, we need to hold the masters connection open so
> that we can detect deadlocks. The issue is that somebody could have
> requested an exclusive lock after the master has initially requested a
> shared lock on all tables. Therefore, the worker process also requests
> a shared lock on the table with NOWAIT and if this fails, we know that
> there is a conflicting lock in between and that we need to abort the
> dump.

I think this is an acceptable limitation, but the window where it can
happen seems awfully wide right now. As things stand, it seems like
we don't try to lock the table in the child until we're about to
access it, which means that, on a large database, we could dump out
99% of the database and then be forced to abort the dump because of a
conflicting lock on the very last table. We could fix that by having
every child lock every table right at the beginning, so that all
possible failures of this type would happen before we do any work, but
that will eat up a lot of lock table space. It would be nice if the
children could somehow piggyback on the parent's locks, but I don't
see any obvious way to make that work. Maybe we just have to live
with it the way it is, but I worry that people whose dumps fail 10
hours into a 12 hour parallel dump are going to be grumpy.

> The connections of the parallel dump use the synchronized snapshot
> feature. However there's also an option --no-synchronized-snapshots
> which can be used to dump from an older PostgreSQL version.

Right now, you have it set up so that --no-synchronized-snapshots is
ignored even if synchronized snapshots are supported, which doesn't
make much sense to me. I think you should allow
--no-synchronized-snapshots with any release, and error out if it's
not specified and the version is too old work without it. It's
probably not a good idea to run with --no-synchronized-snapshots ever,
and doubly so if they're not available, but I'd rather leave that
decision to the user. (Imagine, for example, that we discovered a bug
in our synchronized snapshot implementation.)

I am tempted to advocate calling the flag --unsynchronized-snapshots,
because to me that underscores the danger a little more clearly, but
perhaps that is too clever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message karavelov 2012-01-27 16:48:26 Re: TS: Limited cover density ranking
Previous Message Sushant Sinha 2012-01-27 16:32:13 Re: TS: Limited cover density ranking