plpgsql function startup-time improvements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: plpgsql function startup-time improvements
Date: 2017-12-27 20:38:34
Message-ID: 11986.1514407114@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Attached are patches for two performance-improvement ideas that came
to me while working on
https://www.postgresql.org/message-id/8962.1514399547@sss.pgh.pa.us
The three patches are logically independent and could be committed in
any order. But they touch some overlapping code, so as presented,
you need to apply that other patch first and then these two in
sequence.

The motivation for the first patch is that I noticed that for simple
plpgsql functions, especially triggers, the per-datum palloc()s performed
by copy_plpgsql_datum() during function entry amounted to a significant
fraction of the total runtime. To fix that, the patch simply does one
palloc for the space needed by all datums, relying on a space calculation
performed at the end of function compilation by plpgsql_finish_datums().
This does nothing much for trivial functions with only a few datums, but
for ones with more, it's a worthwhile savings.

BTW, I experimented with a more drastic solution involving separating
the "read only" and "writable" parts of PLpgSQL_datum structs and then
instantiating only the "writable" parts, thus considerably reducing the
amount of data to be copied during per-call initialization. But that
was a lot more invasive to the code, and it seemed to be slower than
what I present here, because performance-critical accesses to variables
had to compute the addresses of both structs associated with the variable.
I've not totally given up hope for that idea, but it'll require more
tuning than I had time for.

In addition to the core idea of the patch, I noticed that there is no
good reason for PLpgSQL_expr to be treated as a kind of PLpgSQL_datum;
those structs are never members of the estate->datums[] array, nor is
there any code expecting them to be structural supersets of PLpgSQL_datum.
So the patch also removes PLPGSQL_DTYPE_EXPR and the useless fields of
PLpgSQL_expr.

Also, I changed PLpgSQL_var.isconst and PLpgSQL_var.notnull from "int"
to "bool", which is what they should have been all along, and relocated
them in the PLpgSQL_var struct. There are two motivations for this.
It saves a whole 8 bytes per PLpgSQL_var, at least on 64-bit machines,
because the fields now fit into what had been wasted padding space;
reducing the size of what we have to copy during copy_plpgsql_datums
has to be worth something. Second, those fields are now adjacent to
the common PLpgSQL_variable fields, which will simplify migrating them
into PLpgSQL_variable, as I anticipate we'll want to do at some point
when we allow composite variables to be marked CONSTANT and maybe NOT
NULL.

The idea of the second patch came from noticing that in simple trigger
functions, quite a large fraction of cycles went into setting up the
"built in" variables such as tg_name, even though many trigger functions
probably never read most of those variables. We could improve that by
not computing the values until/unless they're read. There are various
names for this technique, but the one that seemed most evocative to me
was to say that these variables have "promises" attached to them. So
that's what the patch calls them. We mark the variables with an enum
indicating which promise needs to be fulfilled for each one, and then
when about to read a datum, we fulfill the promise if needed.

The method I settled on for that was to invent a separate DTYPE_PROMISE,
which otherwise is treated exactly like DTYPE_VAR, and to code places
like exec_eval_datum() like this:

switch (datum->dtype)
{
+ case PLPGSQL_DTYPE_PROMISE:
+ /* fulfill promise if needed, then handle like regular var */
+ plpgsql_fulfill_promise(estate, (PLpgSQL_var *) datum);
+
+ /* FALL THRU */
+
case PLPGSQL_DTYPE_VAR:
{
PLpgSQL_var *var = (PLpgSQL_var *) datum;

The extra DTYPE is a little bit grotty, but it's not awful. One
alternative I experimented with was to just treat these variables
as plain DTYPE_VAR, requiring coding like

case PLPGSQL_DTYPE_VAR:
{
PLpgSQL_var *var = (PLpgSQL_var *) datum;

+ if (unlikely(var->promise != PLPGSQL_PROMISE_NONE))
+ plpgsql_fulfill_promise(estate, var);
+
*typeid = var->datatype->typoid;
*typetypmod = var->datatype->atttypmod;
*value = var->value;

However, this way is injecting an additional test-and-branch into
hot code paths, and it was demonstrably slower.

With these patches, I see performance improvements of 10% to 20%
on simple but not totally unrealistic triggers, for example

create or replace function mytrig() returns trigger language plpgsql as
$$
begin
if (new.f1 != new.f2) or (new.f3 != new.f4) then
new.f3 = 42;
end if;
return new;
end$$ stable;

(BTW, those are percentages of total INSERT runtime, not just of
the trigger proper; though I cheated to the extent of using a
temp not regular table.)

It seems possible that the "promise" technique could be useful for
other plpgsql special variables in future. I thought briefly about
applying it to triggers' NEW and OLD arguments, but desisted because
(a) it's only a win if triggers will commonly not touch the variable,
which seems unlikely to be true for NEW/OLD; and (b) it would have
required infrastructure for attaching a promise to a DTYPE_REC
variable, which was more pain than I wanted. But I wonder if it'd
be useful for, say, the special variables that exception blocks create.

I'll add this to the January commitfest.

regards, tom lane

Attachment Content-Type Size
faster-plpgsql-datum-setup-1.patch text/x-diff 14.7 KB
plpgsql-promises-1.patch text/x-diff 31.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-27 21:31:12 Re: pgsql: Get rid of copy_partition_key
Previous Message Antonio Belloni 2017-12-27 20:18:27 Contributing with code