Where are we on stored procedures?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgsql-jdbc(at)postgreSQL(dot)org
Subject: Where are we on stored procedures?
Date: 2005-02-24 20:13:15
Message-ID: 26903.1109275995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

Gavin and Neil made some noise in late September about implementing
stored procedures for PG 8.1, but I haven't heard anything more about
it since that thread died off. I've been getting some pressure inside
Red Hat to see us support more of the JDBC CallableProcedure spec, so
I'd like to reopen the discussion.

In the previous discussion starting here:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00702.php
it seemed that we were bandying around several different issues.
People wanted "procedures" to differ from our current implementation
of "functions" in such ways as:

1. Executing outside the database engine, and therefore being able to
start/commit transactions. (This is *fundamentally* different from our
current concept of functions, and I think that any implementation that
tries to gloss over the difference will be doomed to failure.)

2. Having OUT parameters, and perhaps also INOUT parameters. At least
in the JDBC spec, these are seen as scalar values, and so the feature
is really just syntactic sugar for returning a composite type or row
value. For instance, a procedure foo(x IN int, y OUT text, z OUT float)
could perhaps be called via
SELECT y, z FROM foo(42);
where foo(x) is seen as returning the rowtype (y text, z float).

3. Being able to return multiple result sets, ie, more than one SETOF
something. In our previous discussion we tied this to OUT parameters
but they're not necessarily the same thing --- the JDBC spec sees result
sets as totally different objects.

4. Not having a distinguished function result, a/k/a returning void.
While a function result is unnecessary given OUT parameters, this feels
like an accidental thing rather than an essential aspect.

5. Matching parameters by name instead of by position.

6. Allowing parameters to be omitted from a call, with default values
used instead.

#5 and #6 would also be interesting for regular functions, but it's
unclear how well we can support them without disallowing overloading
of procedure/function names --- which of course is a nonstarter for
the existing function facility.

(Gavin and Neil's first proposal also involved inventing a concept of
variables at the SQL level, primarily so that there would be something
to receive the results of OUT parameters. I found that distasteful and
would prefer to avoid it. Another thing that came up was allowing a
procedure -- in one or more of these senses -- to be used as a trigger,
but I think that's a red herring. None of the above attributes are
particularly relevant to a trigger.)

On looking at this list, it seems to me that #1 (transactionality) is
largely orthogonal to the other points, which all have to do with
passing and returning values. The main reason why we might consider
all these things together is that no one is very excited about the idea
of having even more than two kinds of objects, so there is a lot of
temptation to press for having all these features in "procedures"
rather than recognize that they are largely separable needs.

The more I think about it, the more I think that #1 belongs outside the
database entirely. The database engine more or less has to operate
entirely within transactions --- heck, we cannot even look up a stored
procedure's definition in a system catalog without starting a
transaction. This is not to say that the facility can't exist
physically within the backend, but that it would work a lot better if
it weren't logically inside the database. What about inventing a
protocol facility that lets clients send a chunk of, say, Perl or
Python code to execute in an interpreter that can in turn send commands
to the DB engine proper? The point here is that that interpreter is
wrapped around the DB engine, not vice versa as occurs when executing a
plperl or plpython function. (The only real difference between this
idea and just executing the same code on the client side is avoiding
network round trips.)

BTW, using plpgsql in this fashion is a nonstarter, at least with
anything resembling its current implementation. Because plpgsql relies
on the database engine to do even simple expression evaluation, it's
just hopeless to think of it doing anything useful outside a
transaction. But we have plenty of external programming languages
available that are perfectly capable of doing their own arithmetic and
logic, and so could meaningfully drive the database engine through a
series of transactions.

Having said all that, I don't have any personal interest in pursuing #1
immediately. (Though anyone who does is welcome to.) What I would
like to see is some forward movement on the other points, particularly
#2 which is blocking my Red Hat coworkers from making progress. So the
real bottom line to this overly long email is that I'd like to create
a consensus that it's OK to work on #2 and perhaps #3 in the context of
our existing function facility, without tackling #1. This'd involve
work in both the server and the JDBC driver.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-24 20:29:20 Re: Some download statistics
Previous Message Troels Arvin 2005-02-24 20:11:54 Re: Some download statistics

Browse pgsql-jdbc by date

  From Date Subject
Next Message Francisco Figueiredo Jr. 2005-02-24 21:49:35 Re: [JDBC] Where are we on stored procedures?
Previous Message Markus Schaber 2005-02-24 18:22:21 Development using Eclipse