Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-01-27 15:57:41
Message-ID: CA+TgmobwKLtWDCDdvTZ1AbnqDPhA6+WewpDMzpvcEFufDvsTgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> So this is the parallel pg_dump patch, generalizing the existing
> parallel restore and allowing parallel dumps for the directory archive
> format, the patch works on Windows and Unix.

This patch introduces a large amount of notational churn, as perhaps
well-illustrated by this hunk:

static int
! dumpBlobs(Archive *AHX, void *arg)
{
+ /*
+ * This is a data dumper routine, executed in a child for
parallel backup,
+ * so it must not access the global g_conn but AH->connection instead.
+ */
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;

It seems pretty grotty to have a static function that gets an argument
of the wrong type, and then just immediately turns around and casts it
to something else. It's not exactly obvious that that's even safe,
especially if you don't know that ArchiveHandle is a struct whose
first element is an Archive. But even if you do know that subclassing
is intended, that doesn't prove that the particular Archive object is
always going to be an ArchiveHandle under the hood. If it is, why not
just pass it as an ArchiveHandle to begin with? I think this needs to
be revised in some way. At least in the few cases I checked, the only
point of getting at the ArchiveHandle this way was to find
AH->connection, which suggests to me either that AH->connection should
be in the "public" section, inside Archive rather than ArchiveHandle,
or else that we ought to just pass the connection object to this
function (and all of its friends who have similar issues) as a
separate argument. Either way, I think that would make this patch
both cleaner and less-invasive. In fact we might want to pull out
just that change and commit it separately to simplify review of the
remaining work.

It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
The way you have it, fmtQualifiedId() is now with fmtId(), but no
longer with fmtCopyColumnList(), the only other similarly named
function in that directory. That may be more logical, or it may not,
but rearranging the code like this makes it a lot harder to review,
and I would prefer that we make such changes as separate commits if
we're going to do them, so that diff can do something sensible with
the changes that are integral to the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-27 15:58:11 Re: patch for parallel pg_dump
Previous Message Marko Kreen 2012-01-27 15:48:11 Re: Speed dblink using alternate libpq tuple storage