Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: thomas(dot)munro(at)enterprisedb(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, eric(dot)cyr(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Date: 2018-11-09 05:39:31
Message-ID: 20181109.143931.243136889.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <5BE4318F(dot)4040002(at)lab(dot)ntt(dot)co(dot)jp>
> (2018/11/08 10:50), Thomas Munro wrote:
> > I take back what I said earlier about false positives from other
> > pipes. I think it's only traditional Unix programs designed for use
> > in pipelines and naive programs that let SIGPIPE terminate the
> > process. The accepted answer here gives a good way to think about
> > it:
> >
> > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
>
> Thanks for the information!
>
> > A program sophisticated enough to be writing to other pipes is no
> > longer in that category and should be setting up signal dispositions
> > itself, so I agree that we should enable the default disposition and
> > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
> > close to the intended purpose of that signal AFAICS.
>
> Great!
>
> >>> In the sense of "We don't care the reason", negligible reasons
> >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> >>> theoretically ignored safely. "theoretically" here means it is
> >>> another issue whether we rely on the output from a program which
> >>> causes SEGV (or any reason other than SIGPIPE, which we caused).
> >>
> >> For the SIGSEGV case, I think it would be better that we don't rely on
> >> the output data, IMO, because I think there might be a possibility
> >> that
> >> the program have generated that data incorrectly/unexpectedly.
> >
> > +1
> >
> > I don't think we should ignore termination by signals other than
> > SIGPIPE: that could hide serious problems from users. I want to know
> > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> > happens after we read enough data; there is a major problem that a
> > human needs to investigate!
>
> I think so too.

Ok, I can live with that with no problem.

> >>> As the result it doesn't report an error for SELECT * FROM ft2
> >>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >>>
> >>> =# select * from ft limit 1;
> >>> a
> >>> -------
> >>> test1
> >>> (1 row)
> >>>
> >>> limit 2 reports the error.
> >>>
> >>> =# select * from ft limit 2;
> >>> ERROR: program "/home/horiguti/work/exprog" failed
> >>> DETAIL: child process exited with exit code 1
> >>
> >> I think this would be contrary to users expectations: if the SELECT
> >> command works for limit 1, they would expect that the command would
> >> work
> >> for limit 2 as well. So, I think it would be better to error out that
> >> command for limit 1 as well, as-is.
> >
> > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> > error. For LIMIT 1, we got all the rows we wanted, and then we closed
> > the pipe. If we got a non-zero non-signal exit code, or a signal exit
> > code and it was SIGPIPE (not any other signal!), then we should
> > consider that to be expected.
>
> Maybe I'm missing something, but the non-zero non-signal exit code
> means that there was something wrong with the called program, so I
> think a human had better investigate that as well IMO, which would
> probably be a minor problem, though. Too restrictive?

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

> > I tried to think of a scenario where the already-received output is
> > truly invalidated by a later error that happens after we close the
> > pipe. It could be something involving a program that uses something
> > optimistic like serializable snapshot isolation that can later decide
> > that whatever it told you earlier is not valid after all. Suppose the
> > program is clever enough to expect EPIPE and not consider that to be
> > an error, but wants to tell us about serialization failure with a
> > non-zero exit code. To handle that, you'd need a way to provide an
> > option to file_fdw to tell it not to ignore non-zero exit codes after
> > close. This seems so exotic and contrived that it's not worth
> > bothering with for now, but could always be added.
>
> Interesting! I agree that such an option could add more flexibility
> in handling the non-zero-exit-code case.

I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.

> > BTW just for curiosity:
> >
> > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > Exit code: terminated by SIGPIPE, like seq
>
> Good to know!

Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.

| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0

> > ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> > Exit code: 1, like Python
>
> Sad. Anyway, thanks a lot for these experiments in addition to the
> previous ones!

ruby reported broken pipe but exit status was 0..

create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
a
---
1
...
5
(5 rows)
(no error)

> > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> > <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> (2018/11/06 19:50), Thomas Munro wrote:
> >>> On my FreeBSD system, I compared the output of procstat -i (= show
> >>> signal disposition) for two "sleep 60" processes, one invoked from the
> >>> shell and the other from COPY ... FROM PROGRAM. The differences were:
> >>> PIPE, TTIN, TTOU and USR2. For the first and last of those, the
> >>> default action would be to terminate the process, but the COPY PROGRAM
> >>> child ignored them; for TTIN and TTOU, the default action would be to
> >>> stop the process, but again they are ignored. Why do bgwriter.c,
> >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> >>> regular backends?
> >>
> >> So, we should revert SIGUSR2 as well to default processing?
> >
> > I don't think it matters in practice, but it might be nice to restore
> > that just for consistency.
>
> Agreed.
>
> > I'm not sure what to think about the TTIN,
> > TTOU stuff; I don't understand job control well right now but I don't
> > think it really applies to programs run by a PostgreSQL backend, so if
> > we restore those it'd probably again be only for consistency. Then
> > again, there may be a reason someone decided to ignore those in the
> > postmaster + regular backends but not the various auxiliary processes.
> > Anyone?
>
> I don't have any idea about that.

In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically). Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2018-11-09 07:32:54 Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Previous Message Amit Langote 2018-11-09 05:11:24 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-09 05:45:18 Re: Performance improvements of INSERTs to a partitioned table
Previous Message Amit Langote 2018-11-09 05:38:55 Re: using expression syntax for partition bounds