Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: eric(dot)cyr(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Date: 2018-11-06 10:50:33
Message-ID: CAEepm=31R9YQ8xUS19Y=izVYoN_7d3DY5X2M_pkT-3CnwmdHiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Oct 24, 2018 at 1:21 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease. If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure. The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream? If not, what infrastructure do we
> need to add to control that? In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)
>
> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

I'm not sure about that. It might in theory be telling you about some
other pipe. If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?

> 3. Maybe this should be implemented at some higher level?

It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up". Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:

$ seq 1 1000000 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(1000000): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 32] Broken pipe
... exit code 1

$ cat Test.java
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 1000000; ++i) {
System.out.println(Integer.toString(i));
}
}
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM. The differences were:
PIPE, TTIN, TTOU and USR2. For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored. Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Etsuro Fujita 2018-11-06 12:07:37 Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Previous Message Amit Langote 2018-11-06 09:17:49 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2018-11-06 11:00:51 Re: PostgreSQL Limits and lack of documentation about them.
Previous Message John Naylor 2018-11-06 10:47:37 Re: PostgreSQL Limits and lack of documentation about them.