Controlling changes in plpgsql variable resolution

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Controlling changes in plpgsql variable resolution
Date: 2009-10-18 17:25:43
Message-ID: 7933.1255886743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As most of you will recall, plpgsql currently acts as though identifiers
in SQL queries should be resolved first as plpgsql variable names, and
only failing that do they get processed as names of the query. The
plpgsql parser rewrite that I'm working on will fix that for the
obviously-silly cases where a plpgsql variable is substituted for a
table name or some other non-scalar-variable identifier. However, what
should we do when a name could represent either a plpgsql variable
or a column of the query? Historically we've resolved it as the
plpgsql variable, but we've sure heard a lot of complaints about that.
Oracle's PL/SQL has the precedence the other way around: resolve first
as the query column, and only failing that as a PL variable. The Oracle
behavior is arguably less surprising because the query-provided names
belong to the nearer enclosing scope. I believe that we ought to move
to the Oracle behavior over time, but how do we get there from here?
Changing it is almost surely going to break a lot of people's functions,
and in rather subtle ways.

I think there are basically three behaviors that we could offer:

1. Resolve ambiguous names as plpgsql (historical PG behavior)
2. Resolve ambiguous names as query column (Oracle behavior)
3. Throw error if name is ambiguous (useful for finding problems)

(Another possibility is to throw a warning but proceed anyway. It would
be easy to do that if we proceed with the Oracle behavior, but *not*
easy if we proceed with the historical PG behavior. The reason is that
the code invoked by transformColumnRef may have already made some
side-effects on the query tree. We discussed the implicit-RTE behavior
yesterday, but there are other effects of a successful name lookup,
such as marking columns for privilege checking.)

What I'm wondering about at the moment is which behaviors to offer and
how to control them. The obvious answer is "use a GUC" but that answer
scares me because of the ease with which switching between #1 and #2
would break plpgsql functions. It's not out of the question that that
could even amount to a security problem. I could see using a GUC to
turn the error behavior (#3) on and off, but not to switch between #1
and #2.

Another possibility is to control it on a per-function basis by adding
some special syntax to plpgsql function bodies to say which behavior
to use. We could for instance extend the never-documented "#option"
syntax. This is pretty ugly and would be inconvenient to use too
--- if people have to go and add "#option something" to a function,
they might as well just fix whatever name conflicts it has instead.

I'm not seeing any choice that seems likely to make everybody happy.
Any comments or ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-10-18 18:50:43 Boosting cost estimates for some built-in functions
Previous Message Ron Mayer 2009-10-18 16:17:27 Re: Rejecting weak passwords