Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: patch for parallel pg_dump
Date: 2012-04-03 15:04:41
Message-ID: CA+Tgmoa9=9yRvRZBLq1BdFzVL-K_3FC7uFNDuMoxBhdywd9Ltg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 3, 2012 at 10:40 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
>> On a similar note, what's the point of changing struct Archive to have
>> int numWorkers instead of int number_of_jobs, and furthermore
>> shuffling the declaration around to a different part of the struct?
>
> number_of_jobs was in the struct RestoreOptions before, a structure
> that is not used when doing a dump. I moved it to the Archive as it is
> used by both dump and restore and since all other code talks about
> "workers" I changed the name to "numWorkers".

Gah. Somehow I feel that splitting up this patch into two pieces
hasn't made anything any better.

>> On another note, I am not sure that I like the messaging protocol
>> you've designed.  It seems to me that this has little chance of being
>> reliable:
>>
>> + void (*on_exit_msg_func)(const char *modulename, const char *fmt,
>> va_list ap) = vwrite_msg;
>>
>> I believe the idea here is that you're going to capture the dying gasp
>> of the worker thread and send it to the master instead of printing it
>> out.  But that doesn't seem very reliable.  There's code all over the
>> place (and, in particular, in pg_dump.c) that assumes that we may
>> write messages at any time, and in particular that we may emit
>> multiple messages just before croaking.
>
> I guess you're talking about the code in pg_dump that reads in the
> database schema and the details of all the different objects in the
> schema. This code is run before forking off workers and is always
> executed in the master.

OK, but it seems like a pretty fragile assumption that none of the
workers will ever manage to emit any other error messages. We don't
rely on this kind of assumption in the backend, which is a lot
better-structured and less spaghetti-like than the pg_dump code.

>> Also, I like the idea of making it possible to use assertions in
>> front-end code.  But I think that if we're going to do that, we should
>> make it work in general, not just for things that include
>> pg_backup_archiver.h.
>
> I completely agree. Assertions helped a lot dealing with concurrent
> code. How do you want to tackle this for now? Want me to create a
> separate header pg_assert.h as part of my patch? Or is it okay to
> factor it out later and include it from the general header then?

I'll just go do it, barring objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-03 15:33:12 invalid search_path complaints
Previous Message Alvaro Herrera 2012-04-03 14:46:47 Re: patch for parallel pg_dump