Re: WIP patch: convert SQL-language functions to return tuplestores

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: convert SQL-language functions to return tuplestores
Date: 2008-10-28 14:26:42
Message-ID: 603c8f070810280726k7fd01877l16bf7d8233a8a036@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> I always thought we considered that a bug though. It sure would be nice if we
>> could generate results as needed instead of having to generate them in advance
>> and store all of them.
> I suppose, but short of a fundamental rethink of how PL functions work
> that's not going to happen. There's also the whole issue of when do
> side-effects happen (such as before/after statement triggers).

For PL/pgsql, I think it might be possible to execute a function to
precisely the point where you have generated a sufficient number of
records. In other words, when someone asks for a tuple, you start
executing the function until a tuple pops out, and then save the
execution context until someone asks for another. Conceivably you can
push LIMIT and WHERE clauses down into any RETURN QUERY statements
executed, as well. Maybe that qualifies as a fundamental rethink,
though, and we can worry about how to suppress the tuplestore in that
case when and if someone is prepared to implement it. For other
procedural languages, you would need support from the executor for
that PL, which in most cases will probably be lacking.

<thinks a little more>

In fact, I suspect that you would gain a lot by optimizing
specifically for the case of a PL/pgsql function of the form: (1)
execute 0 or more statements that may or may not have side effects but
do not return any tuples, (2) execute exactly 1 RETURN QUERY
statement, and (3) implicit or explicit RETURN. I suspect that's a
very common usage pattern, and it wouldn't require being able to save
the entire execution context at an arbitrary point.

(I agree that BEFORE/AFTER statement triggers are a problem here but
I'm not sure that they are an insoluble one, and I'd hate for that to
be the thing that kills this type of optimization. Even if you
implemented a full-blown partial-execution model, it would be
reasonable to always run any particular INSERT/UPDATE/DELETE to
completion. It's really SELECT that is the problem.)

>> In particular I fear there are a lot of places that use functions where we
>> might expect them to use views. They're never going to get really good plans
>> but it would be nice if we could at least avoid the extra materialize steps.
> Agreed, but I think the fundamental solution there, for simple-select
> functions, is inlining.

+1. Upthread passing LIMIT and OFFSET clauses into the SRF as
parameters was suggested, but that's really intractable for real-world
use where you are also applying WHERE clauses to the SRF results.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-10-28 14:50:30 Re: Proposal of PITR performance improvement for 8.4.
Previous Message Heikki Linnakangas 2008-10-28 14:22:25 Updating FSM on recovery