Re: patch for parallel pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-01-31 05:55:04
Message-ID: CACw0+12F6c642g1Uy52VGecDfcHeWLaxCN7yrwv8WF5sDrbYXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But the immediate problem is that pg_dump.c is heavily reliant on
> global variables, which isn't going to fly if we want this code to use
> threads on Windows (or anywhere else).  It's also bad style.

Technically, since most of pg_dump.c dumps the catalog and since this
isn't done in parallel but only in the master process, most functions
need not be changed for the parallel restore. Only those that are
called from the worker threads need to be changed, this has been done
in e.g. dumpBlobs(), the change that you quoted upthread.

> But it
> seems possible that we might someday want to dump from one database
> and restore into another database at the same time, so maybe we ought
> to play it safe and use different variables for those things.

Actually I've tried that but in the end concluded that it's best to
have at most one database connection in an ArchiveHandle if you don't
want to do a lot more refactoring. Besides the normal connection
parameters like host, port, ... there's also std_strings, encoding,
savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that
wouldn't be obvious where they belonged to.

Speaking about refactoring, I'm happy to also throw in the idea to
make the dump and restore more symmetrical than they are now. I kinda
disliked RestoreOptions* being a member of the ArchiveHandle without
having something similar for the dump. Ideally I'd say there should be
a DumpOptions and a RestoreOptions field (with a "struct Connection"
being part of them containing all the different connection
parameters). They could be a union if you wanted to allow only one
connection, or not if you want more than one.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-01-31 07:19:22 Re: Speed dblink using alternate libpq tuple storage
Previous Message Tom Lane 2012-01-31 04:29:44 Re: [GENERAL] Why extract( ... from timestamp ) is not immutable?