Re: patch for parallel pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch for parallel pg_dump
Date: 2012-03-13 17:53:34
Message-ID: CA+Tgmobi0MdufsO92rgy9eGdXdrAf7n=iKXmN+sR0XR8pwgdug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 12, 2012 at 11:35 PM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> How do you mean it's unused? pg_dump_sort.c uses relpages to dump the
> largest tables first. What you don't want to see in a parallel dump is
> a worker starting to dump a large table while everybody else is
> already idle...

What I mean is that the function ArchiveEntry() is defined in
pg_backup_archiver.c, and it takes an argument called relpages, and
the string "relpages" does not appear anywhere else in that file.

>> The backend can have a wrapper function around this that calls ereport
>> using the error_string and error_code, and any front-end code that
>> wants to use this can do so directly.
>
> I tried this actually (patch attached) but then I wanted to test it
> and couldn't find anything that used pgpipe() on Windows.
>
> pg_basebackup/pg_basebackup.c is using it but it's in an #ifndef WIN32
> and the same is true for postmaster/syslogger.c. Am I missing
> something or has this Windows implementation become stale by now? I'll
> append the patch but haven't adapted the pg_dump patch yet to use it.
> Should we still go forward the way you proposed?

Dunno. Can we get an opinion on that from one of the Windows guys?
Andrew, Magnus?

>> +/*
>> + * The parallel error handler is called for any die_horribly() in a
>> child or master process.
>> + * It then takes control over shutting down the rest of the gang.
>> + */
>>
>> I think this needs to be revised to take control in exit_nicely(),
>> maybe by using on_exit_nicely().  Trapping die_horribly() won't catch
>> everything.
>
> It's actually not designed to catch everything. This whole error
> handler thing is only there to report a single error to the user which
> is hopefully the root cause of why everybody is shutting down. Assume
> for example that we cannot get a lock on one table in a worker. Then
> the worker would die_horribly() saying that it cannot get a lock. The
> master would receive that message and shut down. Shutting down for the
> master means killing all the other workers.
>
> The master terminates because a worker died. And all the other workers
> die because the master killed them. Yet the root cause for the
> termination was the fact that one of the workers couldn't get a lock,
> and this is the one and only message that the user should see.
>
> If a child terminates without leaving a message, the master will still
> detect it and just say "a worker process died unexpectedly" (this part
> was actually broken, but now it's fixed :-) )

All that may be true, but I still don't see why it's right for this to
apply in the cases where the worker thread says die_horribly(), but
not in the cases where the worker says exit_horribly().

> Or we change fmtQualifiedId to take an int and then we always pass the
> archive version instead of the Archive* ?

Hmm, I think that might make sense.

>> +enum escrow_action { GET, SET };
>> +static void
>> +parallel_error_handler_escrow_data(enum escrow_action act,
>> ParallelState *pstate)
>> +{
>> +       static ParallelState *s_pstate = NULL;
>> +
>> +       if (act == SET)
>> +               s_pstate = pstate;
>> +       else
>> +               *pstate = *s_pstate;
>> +}
>>
>> This seems like a mighty complicated way to implement a global variable.
>
> Well, we talked about that before, when you complained that you
> couldn't get rid of the global g_conn because of the exit handler.
> You're right that in fact it is an indirect global variable here but
> it's clearly limited to the use of the error handler and you can be
> sure that nobody other than this function writes to it or accesses it
> without calling this function.

Sure, but since all the function does is write to it or access it,
what good does that do me?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-03-13 18:10:45 Re: patch for parallel pg_dump
Previous Message Robert Haas 2012-03-13 17:46:24 Re: foreign key locks, 2nd attempt