Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: thomas(dot)munro(at)enterprisedb(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, eric(dot)cyr(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Date: 2018-11-09 09:27:09
Message-ID: 5BE552ED.4040304@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

(2018/11/09 14:39), Kyotaro HORIGUCHI wrote:
> At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote in<5BE4318F(dot)4040002(at)lab(dot)ntt(dot)co(dot)jp>
>> (2018/11/08 10:50), Thomas Munro wrote:
>>> I take back what I said earlier about false positives from other
>>> pipes. I think it's only traditional Unix programs designed for use
>>> in pipelines and naive programs that let SIGPIPE terminate the
>>> process. The accepted answer here gives a good way to think about
>>> it:
>>>
>>> https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
>>
>> Thanks for the information!
>>
>>> A program sophisticated enough to be writing to other pipes is no
>>> longer in that category and should be setting up signal dispositions
>>> itself, so I agree that we should enable the default disposition and
>>> ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty
>>> close to the intended purpose of that signal AFAICS.
>>
>> Great!
>>
>>>>> In the sense of "We don't care the reason", negligible reasons
>>>>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
>>>>> theoretically ignored safely. "theoretically" here means it is
>>>>> another issue whether we rely on the output from a program which
>>>>> causes SEGV (or any reason other than SIGPIPE, which we caused).
>>>>
>>>> For the SIGSEGV case, I think it would be better that we don't rely on
>>>> the output data, IMO, because I think there might be a possibility
>>>> that
>>>> the program have generated that data incorrectly/unexpectedly.
>>>
>>> +1
>>>
>>> I don't think we should ignore termination by signals other than
>>> SIGPIPE: that could hide serious problems from users. I want to know
>>> if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
>>> happens after we read enough data; there is a major problem that a
>>> human needs to investigate!
>>
>> I think so too.
>
> Ok, I can live with that with no problem.

OK

>>>>> As the result it doesn't report an error for SELECT * FROM ft2
>>>>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>>>>>
>>>>> =# select * from ft limit 1;
>>>>> a
>>>>> -------
>>>>> test1
>>>>> (1 row)
>>>>>
>>>>> limit 2 reports the error.
>>>>>
>>>>> =# select * from ft limit 2;
>>>>> ERROR: program "/home/horiguti/work/exprog" failed
>>>>> DETAIL: child process exited with exit code 1
>>>>
>>>> I think this would be contrary to users expectations: if the SELECT
>>>> command works for limit 1, they would expect that the command would
>>>> work
>>>> for limit 2 as well. So, I think it would be better to error out that
>>>> command for limit 1 as well, as-is.
>>>
>>> I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
>>> error. For LIMIT 1, we got all the rows we wanted, and then we closed
>>> the pipe. If we got a non-zero non-signal exit code, or a signal exit
>>> code and it was SIGPIPE (not any other signal!), then we should
>>> consider that to be expected.
>>
>> Maybe I'm missing something, but the non-zero non-signal exit code
>> means that there was something wrong with the called program, so I
>> think a human had better investigate that as well IMO, which would
>> probably be a minor problem, though. Too restrictive?
>
> I think Thomas just saying that reading more lines can develop
> problems. According to the current discussion, we should error
> out if we had SEGV when limit 1.

Ah, I misread that. Sorry for the noise.

>>> On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
>>> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> (2018/11/06 19:50), Thomas Munro wrote:
>>>>> On my FreeBSD system, I compared the output of procstat -i (= show
>>>>> signal disposition) for two "sleep 60" processes, one invoked from the
>>>>> shell and the other from COPY ... FROM PROGRAM. The differences were:
>>>>> PIPE, TTIN, TTOU and USR2. For the first and last of those, the
>>>>> default action would be to terminate the process, but the COPY PROGRAM
>>>>> child ignored them; for TTIN and TTOU, the default action would be to
>>>>> stop the process, but again they are ignored. Why do bgwriter.c,
>>>>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
>>>>> regular backends?
>>>>
>>>> So, we should revert SIGUSR2 as well to default processing?
>>>
>>> I don't think it matters in practice, but it might be nice to restore
>>> that just for consistency.
>>
>> Agreed.
>>
>>> I'm not sure what to think about the TTIN,
>>> TTOU stuff; I don't understand job control well right now but I don't
>>> think it really applies to programs run by a PostgreSQL backend, so if
>>> we restore those it'd probably again be only for consistency. Then
>>> again, there may be a reason someone decided to ignore those in the
>>> postmaster + regular backends but not the various auxiliary processes.
>>> Anyone?
>>
>> I don't have any idea about that.
>
> In my understanding processes not connected to a
> terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> it artifically). Since child processes are detached by setsid()
> (on Linux), programs called in that way also won't have a
> controlling terminal at the start time and I suppose they have no
> means to connect to one since they are no longer on the same
> session with postmaster.

For TTIN and TTOU, we would first need to make clear the reason for the
inconsistency Thomas pointed out. I'm wondering if we should leave the
TTIN/TTOU stuff for future work.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message angell2006 2018-11-09 09:27:54 PostgreSQL error (Segmentation Fault with message "SSL SYSCALL error: EOF detected")
Previous Message Peter Eisentraut 2018-11-09 08:26:53 Re: Tables created WITH OIDS cannot be dumped/restored properly

Browse pgsql-hackers by date

  From Date Subject
Next Message Kato, Sho 2018-11-09 09:36:41 RE: Performance improvements of INSERTs to a partitioned table
Previous Message Karsten Hilbert 2018-11-09 09:25:18 FIXED: backend crash on DELETE, reproducible locally