Re: plpython function problem workaround

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: plpython function problem workaround
Date: 2005-03-19 23:34:04
Message-ID: Pine.LNX.4.61.0503192339290.3005@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 18 Mar 2005, Tom Lane wrote:

> Marco Colombo <pgsql(at)esiway(dot)net> writes:
>> Right now I'm parsing the string first, changing the resulting
>> parse tree adding missing nodes (def, INDENT, DEINDENT) and
>> then compiling it.
>
> Hmmm ... is this really going to be simpler or more robust than lexing
> the string carefully enough to insert tabs at the right places? The
> impression I had so far was that you'd only need to understand about
> Python's string-literal conventions to get that right ... and that's
> something that's not likely to change. I'm not so sure that parse
> trees can be regarded as an immutable API.
>
> regards, tom lane

I've completed a proof of concept, I think I can answer:

- simpler? not at all. It requires understanding of how the parser
works. The whole thing is about 50 lines long, but quite a bit of
parser magic is going on. And I'm far from the point I can be
confident about it doing always the right thing. I still have to
handle (de)allocations correctly.

- more robust - yes. The only way to make sure we're lexing the string
the same way python does is to use its lexer. Every single difference
however subtle would call for a bug. And it's re-invening the wheel.
But there's no way you can work at lexer level that I'm aware of.
That is, to add tokens before sending them to the parser. So you
have to work on the parser output.

- I have no idea if the "node" API is immutable at all. For sure,
the interface I'm using is one or two levels below the current one,
and yes, it's more likely to change. I share your concerns here.

If our problem is only string literals, maybe we can handle them
with a dedicated lexer. Python string literals are quite complex
(compared to other languages):

http://docs.python.org/ref/strings.html

but not that hard.

Still, my first concern is that one day we find another corner case
in python syntax that makes our "blind" tab adding fail. And we're
back to square one.

BTW, I'm not preparing a patch for now, I'm working with a test
program. As soon as I finish it, either I'll post it or prepare
a patch against plpython.c, for consideration. I won't say it is
ready for inclusion until someone else more knowledgeable than
me on both PostgreSQL and python embedding looks at it, anyway.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edward Macnaghten 2005-03-20 00:21:00 Re: MS Access to PostgreSQL
Previous Message Stephan Szabo 2005-03-19 23:01:21 Re: TIME TO VOTE - comp.databases.pgsql ballot