signal handling in plpython

From: Mario De Frutos Dieguez <mariodefrutos(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: signal handling in plpython
Date: 2016-10-13 17:57:59
Message-ID: CAFYwGJ3+Xg7EcL2nU-MxX6p+O6c895Pm3mYZ-b+9n9DffEh5MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone :).

First of all, I want to introduce me to this list. My name is Mario de
Frutos and I work at CARTO :)

I come here asking for some advice/help because we're facing some
unexpected behavior when we want to interrupt functions doing CPU intensive
operations in plpython.

Our problem is that we're not able to interrupt them when they're making
CPU intensive operations. For example, when calculating Moran using PySAL,
the SIGINT handler of Postgres is not able to cancel it.

I want to show you some possible solutions that I've tried without success:

- If we don't add a custom signal handler, we're not able to interrupt the
function when it's making CPU intensive operations. When the `SIGINT`
signal is launched, the system is not able to interrupt it until the
function ends.
- If we add a custom signal handler for the `SIGINT`, we are able to
interrupt the CPU intensive function but we're not able to interrupt data
fetching operations like `plpy.execute(query)` because we have overridden
the Postgres handler for that signal.
- As a third option I've added a python context manager to wrap, for
testing purposes, the CPU intensive part (Moran function from PySAL):
```
def _signal_handler(signal_code, frame):
plpy.error(INTERRUPTED BY USER!!')

@contextmanager
def interruptible():
try:
signal.signal(signal.SIGINT, _signal_handler)
yield
finally:
# Restore the default behavoiur for the signal
signal.signal(signal.SIGINT, signal.SIG_DFL)
```
This doesn't work as expected because in the `finally` clause we try to
reset to the default behavior but in Postgres, the behavior for the SIGINT
signal is defined by a [custom handler](
https://github.com/postgres/postgres/blob/master/src/include/tcop/tcopprot.h#L66
).
If we try to retrieve the old handler using `signal.getsignal` we get a
None object

So after all,going back and forth I came up with two possible solutions:
- [custom code
<https://github.com/CartoDB/postgres/commit/5b159b1cce6da38c2c67d4058d544ff9bb179480>]
in `plpython` to make us able to reset the default signal handler after
finish the CPU intensive functions. It seems to work but I'm still doing
some tests. This option lets us call it explicitly and add it to the
`finally` part of a decorator/context manager
- Reset the signal handler at the beginning of the `plpy.execute` or alike
functions like [here
<https://github.com/CartoDB/postgres/commit/5b159b1cce6da38c2c67d4058d544ff9bb179480#diff-4d0cb76412a1c4ee5d9c7f76ee489507R185>
].

As an extra ball, we want to implement the SIGALRM part to mimic the
"statement timeout" behavior too

I don't know if there is a better way to implement this, I know we're
pushing/doing things beyond the scope of plpython but any advise is welcome
:)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2016-10-13 20:03:30 make coverage-html on OS X
Previous Message Tom Lane 2016-10-13 14:54:14 Re: parallel.sgml