Re: Letting plpgsql in on the fun with the new expression eval stuff

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Letting plpgsql in on the fun with the new expression eval stuff
Date: 2017-12-20 13:56:02
Message-ID: 20171220135602.4ixfvaqpeb6z7dil@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Cool to see you looking at that, I think there's quite some optimization
potential around. I've to reread a bunch of plpgsql code, it's not
exactly an area of the code I'm intimately familiar with.

On 2017-12-19 13:00:41 -0500, Tom Lane wrote:
> I'm looking at ways to get plpgsql expression evaluation to go faster,
> and one thing I'm noticing is the rather large overhead of going through
> ExecEvalParamExtern and plpgsql_param_fetch to get to the useful work
> (exec_eval_datum).

What's the workload you're testing? I'm mildly surprised to see
ExecEvalParamExtern() show up, rather than just plpgsql_param_fetch() &
exec_eval_datum(). Or were you just listing that to specify the
callpath?

> We've ameliorated that for DTYPE_VAR variables
> by keeping a pre-set-up copy of their values in a ParamListInfo struct,
> but that's pretty ugly and carries a bunch of costs of its own.

Just to make sure I understand correctly, you're talking about
setup_unshared_param_list() / setup_param_list()?

> What I'm wondering about, given the v10 redesign of expression evaluation,
> is whether we couldn't be smarter about this by allowing plpgsql (or other
> external users) to skip the ParamListInfo representation altogether, and
> instead compile Param references into calls to evaluation functions that
> are better tailored to the problem of fetching the desired value.

Yea, that seems to make sense.

> In the existing execution infrastructure, what seems to make sense is to
> have an ExprEvalStep type that has functionality like EEOP_PARAM_EXTERN,
> but includes a function pointer to a plpgsql-supplied function having the
> same signature as ExecEvalParamExtern. So the execution would look more
> or less like
>
> EEO_CASE(EEOP_PARAM_CALLBACK)
> {
> op->eval_param(state, op, econtext);
> EEO_NEXT();
> }
>
> and there'd need to be some extra fields (at least a void*) in the op
> struct where plpgsql could keep private data.

I think I'd redo the parameters to the callback slightly, but generally
that sounds sane. Was thinking of something more like

One question I have is how we will re-initialize the relevant state
between exec_simple_expr() calls. I guess the most realistic one is that
the op will have a pointer into an array managed by exec_simple_expr()
that can get reset?

> The JIT stuff you're working on could just compile an equivalent of the
> above, although in the very long term maybe there would be some advantage
> to letting add-on modules compile specialized code for such steps.

Yea, that's reasonable too. I'm not yet 100% sure whether it'll not be
more reasonable to add a few more generic opcodes to the central JITing
that external code can emit and core can handle, or a hook like you
describe is better.

> The immediate problem is how can ExecInitExpr generate such a step?
> It can't itself know what to put into the function ptr or the additional
> fields. There has to be a way for it to call a plpgsql-supplied
> support routine that can construct the eval step. (And we have to
> export ExprEvalPushStep, though that doesn't seem like a problem.)

Hm. We could have a version of ExecInitExpr() / ExecInitExprRec that
first gives a callback chance to handle an operation. That'd make
overwriting parts like this quite easy, because we know we want to
handle Param nodes anywhere differently. So we'd not have a Param
specific routine, but the ability to intercept everything, and defer to
ExecInitExprRec() if undesired.

> For compiling full-fledged query trees, what I think we could do is
> add a method (function pointer) to ParamListInfo and have ExecInitExpr
> invoke plan->state->es_param_list_info->compile_param if that's set.
> However, that solution doesn't immediately work for compiling simple
> expressions because we pass a null "parent" pointer when building
> those.
>
> I thought about instantiating a dummy PlanState and EState to use
> just for carrying this info, but that seems pretty ugly.

Yea, not a fan.

> Another way we could do it is to invent ExecInitExprWithParams() that
> takes an additional ParamListInfo pointer, and use that. Rather than
> adding yet one more parameter that has to be passed down through
> ExecInitExprRec, I suggest that we could waste a bit of space in
> struct ExprState and store that value there. Maybe do the same with
> the parent pointer so as to reduce the number of recursive parameters.

I was thinking something slightly different. Namely that we should have
two structs ExprState and ExprBuildState. We can stuff random additions
to ExprBuildState without concerns about increasing ExprState's state.

> I've not written any code around this idea yet, and am not sure
> if it conflicts with what you're trying to do for JIT or further out.
> Comments, better ideas?

I so far see little reason for concern WRT JIT. Implementing support for
a expression step like you describe above is a few minutes worth of
work. There might be some annoying rebasing if the patches conflict, but
even that ought to be manageable.

There *are* some longer term implications that could theoretically
become relevant, although I'm not sure problematic. In a good number of
workloads the initialization of expression steps and executor trees is
the bottleneck. The biggest culprit is tupledesc computations, but also
the expression building. With the latter the problem obviously becomes
a lot "bigger" with JITing - we don't want to recompile functions all
the time. The point where JIT becomes beneficial is a lot lower if
you've to do it only once per prepared statement, rather than once per
query execution... So I'm basically saying that in my opinion more
information has to be built at plan time, long term.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Wildish 2017-12-20 14:02:20 Re: AS OF queries
Previous Message Konstantin Knizhnik 2017-12-20 13:48:35 Re: AS OF queries