Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-02-07 15:59:01
Message-ID: CA+Tgmoa+8A8fuH-GKi1znuizwnJuDpCWWGSTVZ2foaD0SfVYeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 3, 2012 at 10:43 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> On Fri, Feb 3, 2012 at 7:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If you're OK with that much I'll go do it.
>
> Sure, go ahead!

It turns out that (as you anticipated) there are some problems with my
previous proposal. For one thing, an Archive isn't really just a
connection, because it's also used as a data sink by passing it to
functions like ArchiveEntry(). For two things, as you probably
realized but I failed to, ConnectDatabase() is already setting
AH->connection to the same PGconn it returns, so the idea that we can
potentially have multiple connection objects using the existing
framework is not really true; or at least it's going to require more
work than I originally thought. I think it might still be worth doing
at some point, but I think we probably need to clean up some of the
rest of this mess first.

I've now rejiggered things so that the Archive is passed down to
everything that needs it, so the global variable g_fout is gone. I've
also added a couple of additional accessors to pg_backup_db.c so that
most places that issue queries can simply use those routines without
needing to peek under the hood into the ArchiveHandle. This is not
quite enough to get rid of g_conn, but it's close: the major stumbling
block at this point is probably exit_nicely(). The gyrations we're
going through to make sure that AH->connection gets closed before
exiting are fairly annoying; maybe we should invent something in
dumputils.c along the line of the backend's on_shmem_exit().

I'm starting to think it might make sense to press on with this
refactoring just a bit further and eliminate the distinction between
Archive and ArchiveHandle. Given a few more accessors (and it really
looks like it would only be a few), pg_dump.c could treat an
ArchiveHandle as an opaque struct, which would accomplish more or less
the same thing as the current design, but in a less confusing fashion
- i.e. without giving the reader the idea that the author desperately
wishes he were coding in C++. That would allow simplification in a
number other places; just to take one example, we wouldn't need both
appendStringLiteralAH and appendStringLiteralAHX.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-02-07 16:29:42 Re: incorrect handling of the timeout in pg_receivexlog
Previous Message Robert Haas 2012-02-07 15:58:10 Re: Speed dblink using alternate libpq tuple storage