Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-01-31 14:05:07
Message-ID: CA+TgmoZhcHMk9WHb0Py5-GsRXLzinRO-boqZN3oTnx+=mk_XXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 12:55 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But the immediate problem is that pg_dump.c is heavily reliant on
>> global variables, which isn't going to fly if we want this code to use
>> threads on Windows (or anywhere else).  It's also bad style.
>
> Technically, since most of pg_dump.c dumps the catalog and since this
> isn't done in parallel but only in the master process, most functions
> need not be changed for the parallel restore. Only those that are
> called from the worker threads need to be changed, this has been done
> in e.g. dumpBlobs(), the change that you quoted upthread.

If we're going to go to the trouble of cleaning this up, I'd prefer to
rationalize it rather than doing just the absolute bare minimum amount
of stuff needed to make it appear to work.

>> But it
>> seems possible that we might someday want to dump from one database
>> and restore into another database at the same time, so maybe we ought
>> to play it safe and use different variables for those things.
>
> Actually I've tried that but in the end concluded that it's best to
> have at most one database connection in an ArchiveHandle if you don't
> want to do a lot more refactoring. Besides the normal connection
> parameters like host, port, ... there's also std_strings, encoding,
> savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that
> wouldn't be obvious where they belonged to.

And just for added fun and excitement, they all have inconsistent
naming conventions and inadequate documentation.

I think if we need more refactoring in order to support multiple
database connections, we should go do that refactoring. The current
situation is not serving anyone well.

> Speaking about refactoring, I'm happy to also throw in the idea to
> make the dump and restore more symmetrical than they are now. I kinda
> disliked RestoreOptions* being a member of the ArchiveHandle without
> having something similar for the dump. Ideally I'd say there should be
> a DumpOptions and a RestoreOptions field (with a "struct Connection"
> being part of them containing all the different connection
> parameters). They could be a union if you wanted to allow only one
> connection, or not if you want more than one.

I like the idea of a struct Connection. I think that could make a lot of sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yeb Havinga 2012-01-31 14:10:35 Re: [v9.2] Add GUC sepgsql.client_label
Previous Message Robert Haas 2012-01-31 13:17:40 Re: foreign key locks, 2nd attempt