Re: Some notes about Param handling with "Oracle style" plpgsql variables

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Some notes about Param handling with "Oracle style" plpgsql variables
Date: 2009-11-02 17:01:40
Message-ID: 162867790911020901l363878f3le9dac9c4768b897b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/11/2 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> One of the interesting properties of Oracle-compatible variable
> references in plpgsql is that the set of variables referenced by a
> given query could change during a forced replan.  For example,
> consider
>
>        declare x int;
>                r record;
>        ...
>        for r in select x,y from tab loop ...
>
> If tab contains a column "x" then the "x" reference in the SELECT
> refers to tab.x; if not, it refers to the plpgsql variable x.
> So when first executing the SELECT we might find that it requires
> a Param reference to the plpgsql variable, and then after a replan
> is forced by ALTER TABLE tab ADD COLUMN x, there is no need for
> the Param anymore.  Or vice versa.
>
> This kinda calls into question whether the Oracle way is actually
> a good idea or not; but my purpose here is not to debate that,
> just to look at what it takes to implement it.
>
> Currently, plpgsql generates a list of the variables referenced by
> any SQL statement or expression immediately upon seeing the text,
> before it's ever even fed to the core parser.  I had been envisioning
> having the parser callback hook construct the list on-the-fly during
> parsing, but the possibility that the list will change from time to
> time means that other changes are needed too.  Notably:
>
> 1. plancache.c does not have any provision for letting the Param type
> array associated with a stored statement change when the statement is
> replanned due to SI invalidation.
>
> 2. The control flow for a replan is that plpgsql calls SPI_execute_plan,
> which calls RevalidateCachedPlan, which does the replan if the cached
> plan is discovered to be stale.  However, plpgsql already had to set up
> the list of actual parameter values before it called SPI_execute_plan,
> which means it is *way* too late to change the list of required Params
> even if plancache let us do it.
>
> After chewing on these facts for awhile, I am thinking that the best
> solution is for plpgsql to abandon the notion of a predetermined list
> of parameters for a SQL query altogether.  What that list basically
> provides is a mapping from Param numbers ($n) to plpgsql "datum numbers"
> (indexes in the list of a plpgsql function's variables).  We could make
> that mapping always be one-to-one, since there's no real reason that the
> Params available to a query have to be consecutively numbered.  So the
> transformColumnRef hook would just pass back a Param using the
> referenced variable's datum number as paramid; it wouldn't bother at all
> with building a data structure listing the specific variables actually
> used in the query.
>
> As far as plancache goes, it would therefore always see a null array
> of Param type OIDs associated with a plpgsql-generated query, and we'd
> not have to provide a way to update that.  (We'd still keep the ability
> to store such an array, because most other callers of plancache will
> still want a fixed list of Params.)  What we'd have to add to plancache
> instead is the ability to install caller-determined parser callback
> hooks when it is calling the parser for a replan.  This seems fairly
> easy to do --- I'm envisioning a sort of meta-hook function that gets
> called with the new ParseState and can insert hook function pointers
> in it.
>
> The other issue with this is what to do at runtime.  We could do it
> with no other changes if we had plpgsql always set up Values/Nulls
> arrays listing *every* datum's current value.  This seems a bit
> brute-force though --- it could be slow in a function with a lot of
> variables, and in most cases any specific query or expression would
> not need most of those values.  What I think we should do instead
> is extend the ParamListInfo structure to add a callback hook function
> that populates individual ParamExternData array entries on-demand.
> The core executor would call the hook when it tried to fetch the
> value of a Param that was currently invalid (ptype == 0).  So the
> hook would be invoked only once per query per referenced parameter,
> which shouldn't be much overhead.  Another interesting property
> of this approach is that it'd fix the longstanding user complaint
> that constructions like
>        if (TG_OP = 'INSERT' and NEW.foo = 'bar') ...
> fail prematurely.  The executor would never demand the value
> of NEW.foo, and thus not fail, if TG_OP isn't INSERT.
>

good idea.

regards
Pavel

> Comments?
>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Atkins 2009-11-02 17:06:00 Re: Error on compile for Windows
Previous Message Alexey Klyukin 2009-11-02 16:46:53 Re: PL/Perl backed crashed during spi_exec_query