Re: Possible major bug in PlPython (plus some other ideas)

From: Bradley McLean <brad(at)bradm(dot)net>
To: Kevin Jacobs <jacobs(at)penguin(dot)theopalgroup(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible major bug in PlPython (plus some other ideas)
Date: 2001-11-09 15:19:39
Message-ID: 20011109101939.A16592@bradm.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Kevin Jacobs (jacobs(at)penguin(dot)theopalgroup(dot)com) [011109 08:59]:
>
> 1) If Plpython is installed as a trusted language, and from what little I
> can glean from the documentation, it should not have any filesystem access.
> However, the default behavior of the restricted execution environment
> being used allows read-only filesystem access.
>...
> It is fairly easy to override this method to unconditionally raise an
> access exception.

Agreed. Just to amplify the point below, there is currently no
distinction between trusted and untrusted in PLpython (hmm, need
a doc change to identify this?). lanpltrusted appears nowhere
in the implementation.

> 2) I'm not sure why the TD dictionary exists. Why not create objects
> 'new', 'old' or 'event' in the global namespace when the interpreter is
> called in the appropriate contexts? The current way is unwieldy, not
> very 'Pythonic' (a frequent justification for change in the Python
> world), and not consistent with other PostgreSQL procedural backends.
> Its possible to keep TD for backward compatibility, so there is no
> downside.
>
> 3) 'old' and 'new' should also provide class-like syntax:
>
> e.g. old.foo, new.baz (using getitem)
>
> instead of
> old['foo'], new['baz'] (using getattr)
>
> Of course we cannot drop the getattr interface, since many valid column
> names are not valid python identifiers (I think -- I haven't looked at
> the SQL grammar lately, though I'm guessing that is the case).

Agree on both.

> 4) Plpython does not use the standard Python boolean checks, which is also
> not very Pythonic and somewhat confusing:
> ...
> I suggest changing the semantics of boolean return values to use
> PyObject_IsTrue(PyObject *o) to properly test for truth values.

Agree. Is this the only type that needs special treatment?

> 5) It should be trivial to create an "untrusted" version of Plpython that
> bypasses the restricted execution environment. This is worthy of some
> consideration, since it may be very useful and can be implemented with
> relative ease.

Strongly agree. I'd like to see it for your "7.2.1" proposal.

> 6) [Very low priority] Its not insane to consider a Plpython procedure
> that spawns off a Python thread to do background processing tasks.
> This is obviously something that will only be possible in an untrusted
> version of the interpreter. Also, if the SPI interface is thread-safe,
> then it may be useful to release the Python interpreter lock around
> some of the SPI calls.

Three weeks ago, I really wanted a feature like this, but I've slowly
been convincing myself that it's a bad idea. Consider the effects of
a race condition in user generated threaded python code that results
in a deadlock within the backend. Extend that to a backend that's
holding a database lock of some kind. The expertise required to
diagnose and debug such a condition would be rather substantial -
threading, python threading, sql, postgres internals, ... in one
person.

I solved my design issue with the use of an external process and
a NOTIFY, and I'm much happier for it. The threaded python stuff
can stay out in another process where the interaction concerns are
minimized.

I will say that the discrepancy between executing SQL from plpython
and from python with the pg driver does cause some cognitive
disconnect, because despite working in the same language, I have
to use different APIs to the same functionality. I don't have a
good proposal to solve it, though.

> OK, so I've got a laundry list of issues. Only the first issue is a real
> show-stopper in my mind, though I'd like to see at least 1-4 addressed
> before 7.2 or 7.2.1, if at all possible. After some discussion, I'll even
> by happy to implement most/all of these items, though I'd like more
> collaboration than just submitting patches blindly for consideration.

I'll happily help.

One last issue:

7) Would it be completely impossible to make an extra-global
dictionary that was shared between multiple back-ends (place
in shared memory)? Anything I'm likely to place in GD, I'm likely
to initialize once and want in all backends.

-Brad McLean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Jacobs 2001-11-09 15:58:17 Re: Possible major bug in PlPython (plus some other ideas)
Previous Message Alex Avriette 2001-11-09 14:31:46 Where might I propose a 'feature'?