Re: Letting plpgsql in on the fun with the new expression eval stuff

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Letting plpgsql in on the fun with the new expression eval stuff
Date: 2017-12-20 20:44:18
Message-ID: 27734.1513802658@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Will send a patch in a bit. I need to write an explanation of what all
> I changed.

OK then. What the attached patch does is:

* Create a new step type EEOP_PARAM_CALLBACK (if anyone has a better
naming idea, I'm receptive) and add the infrastructure needed for
add-on modules to generate that. As discussed, the best control
mechanism for that, at least for the immediate use, seems to be to
add another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly rather than found in the
parent plan node's EState.

* In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer
in a transient field in ExprState. There are now several transient
fields there. We discussed splitting them out to a different struct,
but that seems like material for a different patch. I had to do
something here, though, since otherwise I'd have had to pass down
the stand-alone ParamListInfo as another parameter :-(.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual. Typical access to
the info about a PARAM_EXTERN Param now looks like

if (paramInfo->paramFetch != NULL)
prm = paramInfo->paramFetch(paramInfo, paramId, ...);
else
prm = &paramInfo->params[paramId - 1];

so that if paramFetch is defined, the core code no longer touches
the params[] array at all, and it doesn't have to be there.
(It still can be there, if paramFetch wants to use it; but the sole
existing user of the hook, plpgsql, doesn't want to.)

* This also lets us get rid of ParamListInfo.paramMask, instead
leaving it to the paramFetch hook to decide which param IDs should
be accessible or not. plpgsql_param_fetch was already doing the
identical masking check, so having callers do it too seemed quite
redundant and overcomplex.

* While I was at it, I added a "speculative" flag to paramFetch
that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide
values of non-DTYPE_VAR variables to the planner for fear of
triggering premature "record not assigned yet" or "field not found"
errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for. This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal. copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.) This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

* During compilation of a PARAM_EXTERN expression node, predetermine
as much as we can to select one of several execution routines.

I've done light performance testing on this, mainly comparing the
runtimes for the test functions shown in the second attachment.
I see overall gains that are modest but reproducible (in the range
of a couple percent) for the "row" (named composite type) cases,
and more significant -- around 10% -- for the "record" cases.
I attribute the difference to the fact that the "row" cases use DTYPE_VAR
variables for the fields, which were already pretty well optimized,
whereas the "record" cases use DTYPE_RECFIELD variables which invoked
all that overhead we discussed. The fact that this isn't losing
on DTYPE_VAR convinces me that it should be a win in all cases.

It'd theoretically be possible to split this into three patches,
one to change the stuff around ExecInitExpr, one to rejigger the
ParamListInfo API, and one to get rid of unshared param lists in
plpgsql. However, that would require writing some throwaway code
in plpgsql, so I don't especially want to bother.

I have another patch in the pipeline that interacts with this,
so I'd kind of like to get this committed sooner rather than later.

regards, tom lane

Attachment Content-Type Size
redesign-plpgsql-parameter-passing-1.patch text/x-diff 94.1 KB
test-record-and-row-access-speed.sql text/plain 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-12-20 20:45:29 Re: Tracking of page changes for backup purposes. PTRACK [POC]
Previous Message Pavel Stehule 2017-12-20 20:29:19 Re: Tracking of page changes for backup purposes. PTRACK [POC]