Re: SQL-Invoked Procedures for 8.1

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SQL-Invoked Procedures for 8.1
Date: 2004-09-23 18:12:18
Message-ID: 200409231112.18941.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gavin, Neil,

> Following is a proposal to implement what SQL2003 calls 'SQL-Invoked
> Procedures' and what most people refer to as stored procedures. Fujitsu
> will be funding Neil Conway and I to work on this feature.

Which, by the way, is way keen.

My comments are based on having professionally written several hundred
thousand lines of procedural code for PostgreSQL, SQL Server, and Oracle. I
believe that your interpretation of the spec is correct but that there are
several things not covered by the spec, but implemented by other RDBMSes,
which make stored procedures *useful* which have been omitted. I feel
strongly that these things will make a large difference to people thinking of
migrating to PostgreSQL from other DBMSes, and want to make sure that Neil's
implementation does not make them harder, instead of easier, to to implement
later.

> Procedures are nearly identical to functions.

IMHO, this is largely because the spec regards a great deal of SP
functionality to be "implementation-defined", and is thus kept as vague as
possible. In practice, other DBMSes which have both SPs and Functions treat
them *very* differently.

> 3) Procedures can be run in the same savepoint level as the caller when
> OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003,
> functions must be run on a new savepoint level. From my understanding, we
> do not do this currently.
>
> Work will focus on 1 and 2 until we have the concept of savepoint levels
> with functions. Its possible that we will implement this too if there is
> demand.

One of the things which differentiates SPs on other DBs from PostgreSQL
Functions is transactionality. In SQL Server and Oracle, SPs are not
automatically a transaction; instead, they contain transactions within them.
This is vitally important to DBAs who want to use SPs to automate database
maintenance, loads, transformations, and other activities which require
checkpointing within the course of a program.

For example, I run a nightly data transformation for one client which requires
16 steps with VACUUMs, ANALYZEs and error-handling between them.
Currently, the only way I can implement this for PostgreSQL is to have an
external program (Perl, in my case) manage this and call each step as a
separate function. It would be far easier to manage if I could put all of
the steps, including the vaccums inside one long-running SP, but the required
transaction container prevents this.

> If we go down the route of saying that procedures are a type of function,
> we have the option of allowing users access to OUT and INOUT in functions.
> This would make procedures simply a subset of functions. What do people
> think?

Well, to be frank, my first thought is, why bother? If you're just going to
implement some syntatic sugar on top of the current Function feature, why
bother at all?

Given the opportunity, I would far prefer to set us on a road that would allow
us to replicate -- and exceed -- the functionality of Oracle's PL/SQL. This
proposal does not do that; in fact, if someone were to start implementing
such functionality later they might find this code a stumbling block.

> There will be cases when we need to identify whether a routine is a
> function or a procedure. This could be done two ways. We could say that
> any proc in pg_proc which returns void is a procedure or we could store
> this in some 'protype' column. Thoughts?

Well, see my thoughts above on differentiating SPs from Functions. I
certainly don't think we should be using the same table.

A second point, which I brought up with you on IRC, is to eliminate
overloading and allow named parameter calls on SPs. This is extremely
useful functionality in T-SQL and PL/SQL; in fact, I'd say that it's
essential for any operation that wants to create an SP-centric middleware as
only named parameter calls allow developers to add parameters to existing
procedures without breaking existing calls.

For anyone who doesn't know what I'm talking about, it's this form:

CREATE PROCEDURE do_some_work ( alpha INT, beta INT, gamma TEXT )
etc.

Where you can:
CALL do_some_work( alpha = 5, beta = 7 )
.. and then gamma picks up its default, if any, or even:
CALL do_some_work( gamma = 'Lord Fontleroy', beta = 7, alpha = 1 )

The idea being that for SPs, schema.name is unique regardless of the
parameters. Even if implementing named parameter calls is beyond the
current spec, I will argue strongly in favor of eliminating overloading for
SPs. Overloading serves no purpose for them and prohibits the whole concept
of default values.

> Other databases, and SQL2003, support a few different implementations what
> could be called variables. In SQL2003, there are a few types of
> 'variables':

I completely follow your thinking about variables, and find it makes sense.

> SET VARIABLE <varname> = <val>
> SHOW VARIABLE <varname>

The problem with this is name collisions with the GUC -- especially as it now
allows add-ins to create their own GUC variables. However intuitive SET and
SHOW are, they will lead to problems. Maybe SETVAR and SHOWVAR? Or using
the PL/pgsql assignment operator for the first, and DISPLAY for the second,
e.g.:

num_logos := 917;
DISPLAY num_logos;

Also, you do realize that this is going to lead to requests for SELECT ....
INTO on the SQL command line, don't you?

> The other option is that we do it at the protocol level and modify libpq
> (and psql) to add support. This would allow us something like:

This may be a better approach. I've personally never been comfortable with
the use of variables outside of SPs and packages; it seems orthagonal to the
declaritive nature of SQL. However, this is a aesthic thing and not really
based on practical considerations.

The more practical consideration is, where will OUT and INOUT parameters be
used? Do we want them returned to the SQL session or directly to the
calling client? I would think that practicality would argue in favor of the
latter; I can't see needing variables in SQL except for testing, and having
them in psql will allow me that.

> That might need some massaging but you get the idea. The commands to psql
> translate to lower level protocol level commands which are: create
> variable, set (ie, modify the value of) variable and describe variable.
> Neil thinks we should have create and set in one step. This is great, since
> it covers most cases, but I'm not sure how we have uninitialised values.
> Perhaps we set to NULL?

Yes.

> The only other question (that I can think of now) with respect to
> variables is how they are affected by transactions. My gut feel is
> that they should have transactional semantics.  For example:

I agree strongly with this, especially since we'll be using Savepoints inside
the SPs. Having one's data mods roll back but not the variable values would
be confusing and lead to a *lot* of debugging.

> We can use the same permissions as for functions.

Agreed.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2004-09-23 18:59:20 Table lock on SET STATISTICS
Previous Message Tom Lane 2004-09-23 16:40:41 Re: SQL-Invoked Procedures for 8.1