Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: eric(dot)cyr(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Date: 2018-10-29 06:58:17
Message-ID: 20181029.155817.216002543.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello.

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <18397(dot)1540297291(at)sss(dot)pgh(dot)pa(dot)us>
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease. If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure. The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream? If not, what infrastructure do we
> need to add to control that? In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)

Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.

> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

Since the SIGPIPE may come from another pipe, I think we
shouldn't generally. But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure. Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.

> 3. Maybe this should be implemented at some higher level?

I'm not sure, but it seems to me not needed.

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2018-10-29 07:32:03 Re: BUG #15437: Segfault during insert into declarative partitioned table with a trigger creating partition
Previous Message Michael Paquier 2018-10-29 06:57:14 Re: BUG #15437: Segfault during insert into declarative partitioned table with a trigger creating partition

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-29 07:06:00 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Previous Message legrand legrand 2018-10-29 06:39:19 RE: [Proposal] Add accumulated statistics for wait event