Re: [WIP] Caching constant stable expressions per execution

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Caching constant stable expressions per execution
Date: 2011-09-11 21:22:26
Message-ID: 4736.1315776146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Sun, Sep 11, 2011 at 01:51, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The patch as given has a bunch of implementation issues

> This is my first patch that touches the more complicated internals of
> Postgres. I'm sure I have a lot to learn. :)

Well, people seem to think that this is worth pursuing, so here's a
couple of thoughts about what needs to be done to get to something
committable.

First off, there is one way in which you are cheating that does have
real performance implications, so you ought to fix that before trusting
your performance results too much. You're ensuring that the cached
datum lives long enough by doing this:

+ /* This cache has to persist for the whole query */
+ oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+
+ fcache->cachedResult = ExecMakeFunctionResult(fcache, econtext, isNull, isDone);
+ fcache->cachedIsNull = *isNull;
+
+ /* Set-returning functions can't be cached */
+ Assert(!isDone || *isDone == ExprSingleResult);
+
+ MemoryContextSwitchTo(oldcontext);

IMO this is no good because it means that every intermediate result
computed within the cacheable expression will be leaked into
per_query_memory. Yeah, you're only doing it once, but once could still
be too much. Consider for instance the case where the function
internally generates a lot of cruft over multiple operations, and it
thinks it's cleaning up by resetting ecxt_per_tuple_memory every so
often. If CurrentMemoryContext isn't pointing to ecxt_per_tuple_memory,
this loses. I think what you need to do is run the function in the
normal environment and then use datumCopy() to save the value into
per_query_memory. The reason this is performance-relevant is that the
datum copy step represents real added cycles. I think it probably
doesn't invalidate the idea, but it'd be good to fix it and recheck
your performance numbers before putting in more work. Assuming
that passes ...

The concept you're trying to encapsulate here is not really specific to
FuncExpr or OpExpr nodes. Rather, the idea you want to implement is
"let's cache the result of any expression tree that contains no Vars,
internal Params, or volatile functions". An example of this is that
the result of
CASE WHEN f1() > 42 THEN f2() ELSE NULL END
ought to be perfectly cacheable if f1 and f2 (and the > operator) are
stable or immutable. Now it doesn't seem like a good plan to me to
plaster "stableconst" flags on every expression node type, nor to
introduce logic for handling that into everything in execQual.c.
So what I suggest is that you should invent a new expression node
type CacheExpr (that's just the first name that came to mind, maybe
somebody has a better idea) that takes an expression node as input
and caches the result value. This makes things simple and clean in
the executor. The planner would have to figure out where to inject
CacheExpr nodes into expression trees --- ideally only the minimum
number of nodes would be added. I think you could persuade
eval_const_expressions to do that, but it would probably involve
bubbling additional information back up from each level of recursion.
I haven't thought through the details.

The other thing that is going to be an issue is that I'm fairly sure
this breaks plpgsql's handling of simple expressions. (If there's not
a regression test that the patch is failing, there ought to be ...)
The reason is that we build an execution tree for a given simple
expression only once per transaction and then re-use it. So for
example consider a plpgsql function containing

x := stablefunction();

I think your patch means that stablefunction() would be called only once
per transaction, and the value would be cached and returned in all later
executions. This would be wrong if the plpgsql function is called in
successive statements that have different snapshots, or contains a loop
around the assignment plus operations that change whatever state
stablefunction() looks at. It would be legitimate for stablefunction()
to have different values in the successive executions.

The quick and dirty solution to this would be for plpgsql to pass some
kind of planner flag that disables insertion of CacheExpr nodes, or
alternatively have it not believe that CacheExpr nodes are safe to have
in simple expressions. But that gives up all the advantage of the
concept for this use-case, which seems a bit disappointing. Maybe we
can think of a better answer.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-09-12 01:40:13 Re: superusers are members of all roles?
Previous Message Devrim GÜNDÜZ 2011-09-11 19:44:30 Re: Alpha 1 for 9.2