Re: Parallel pg_dump's error reporting doesn't work worth squat

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel pg_dump's error reporting doesn't work worth squat
Date: 2016-05-25 04:01:37
Message-ID: 20160525.130137.175160263.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

# Note for convenience for others: The commit fixing the original
# issue is 1aa41e3eae3746e05d0e23286ac740a9a6cee7df.

At Mon, 23 May 2016 13:40:30 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <1336(dot)1464025230(at)sss(dot)pgh(dot)pa(dot)us>
> I wrote:
> >> ... the pg_dump process has crashed with a SIGPIPE without printing
> >> any message whatsoever, and without coming anywhere near finishing the
> >> dump.
>
> > Attached is a proposed patch for this. It reverts exit_horribly() to
> > what it used to be pre-9.3, ie just print the message on stderr and die.
> > The master now notices child failure by observing EOF on the status pipe.
> > While that works automatically on Unix, we have to explicitly close the
> > child side of the pipe on Windows (could someone check that that works?).
> > I also fixed the parent side to ignore SIGPIPE earlier, so that it won't
> > just die if it was in the midst of sending to the child.
>
> Now that we're all back from PGCon ... does anyone wish to review this?
> Or have an objection to treating it as a bug fix and patching all branches
> back to 9.3?

FWIW, it seems to me a bug which should be fixed to back
branches.

I tried this only on Linux (I'll try it on Windows afterward) and
got something like this.

pg_dump: [archiver (db)] pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
connection to database "postgres" failed: fe_sendauth: no password supplied
pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
<some repeats>
pg_dump: [parallel archiver] a worker process died unexpectedly
pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
<some repeats>

Although the error messages from multiple children are cluttered,
it would be inevitable and better than vanishing.

It seems hard to distinguish the meaning among the module names
enclosed by '[]' but it is another issue.

In archive_close_connection, the following change means that
si->AHX could be NULL there, as the existing code for
non-parallel does. But it seems to be set once for
si(=shutdown_info)'s lifetime, before reaching there, to a valid
value.

- DisconnectDatabase(si->AHX);
+ if (si->AHX)
+ DisconnectDatabase(si->AHX);

Shouldn't archive_close_connection have an assertion (si->AHX !=
NULL) instead of checking "if (si->AHX)" in each path?

> > BTW, after having spent the afternoon staring at it, I will assert with
> > confidence that pg_dump/parallel.c is an embarrassment to the project.
>
> I've done a bit of work on a cosmetic cleanup patch, but don't have
> anything to show yet.
>
> regards, tom lane

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-05-25 04:15:57 Re: Parallel pg_dump's error reporting doesn't work worth squat
Previous Message Stephen Frost 2016-05-25 03:33:08 Re: Error during restore - dump taken with pg_dumpall -c option