Re: Parallel pg_dump's error reporting doesn't work worth squat

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel pg_dump's error reporting doesn't work worth squat
Date: 2016-05-14 22:45:13
Message-ID: 24048.1463265913@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ getting back to a complaint I made in December ]

I wrote:
> ... the pg_dump process has crashed with a SIGPIPE without printing
> any message whatsoever, and without coming anywhere near finishing the
> dump.

> A bit of investigation says that this is because somebody had the bright
> idea that worker processes could report fatal errors back to the master
> process instead of just printing them to stderr. So when the workers
> fail to establish connections (because of the password problem cited in
> #13727), they don't tell me about that. Oh no, they send those errors
> back up to the pipe to the parent, and then die silently. Meanwhile,
> the parent is trying to send them commands, and since it doesn't protect
> itself against SIGPIPE on the command pipes, it crashes without ever
> reporting anything. If you aren't paying close attention, you wouldn't
> even realize you didn't get a completed dump.

Attached is a proposed patch for this. It reverts exit_horribly() to
what it used to be pre-9.3, ie just print the message on stderr and die.
The master now notices child failure by observing EOF on the status pipe.
While that works automatically on Unix, we have to explicitly close the
child side of the pipe on Windows (could someone check that that works?).
I also fixed the parent side to ignore SIGPIPE earlier, so that it won't
just die if it was in the midst of sending to the child.

BTW, after having spent the afternoon staring at it, I will assert with
confidence that pg_dump/parallel.c is an embarrassment to the project.
It is chock-full of misleading, out-of-date, and/or outright wrong
comments, useless or even wrong Asserts, ill-designed APIs, code that
works quite differently in Unix and Windows cases without a shred of
commentary about the fact, etc etc. I have mostly resisted the temptation
to do cosmetic cleanup in the attached, but this code really needs some
editorial attention.

regards, tom lane

Attachment Content-Type Size
pg_dump-error-reporting-fix-1.patch text/x-diff 15.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh berkus 2016-05-14 22:48:50 Re: 10.0
Previous Message Tom Lane 2016-05-14 21:30:35 Re: exit() behavior on Windows?