Re: plpgsql function startup-time improvements

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: plpgsql function startup-time improvements
Date: 2018-01-24 10:53:04
Message-ID: CAFj8pRDhphJguE+s-TeKmBEpq3xoL03N_3VM5eFTVZnhy+9PAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2017-12-27 21:38 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Attached are patches for two performance-improvement ideas that came
> to me while working on
> https://www.postgresql.org/message-id/8962.1514399547@sss.pgh.pa.us
> The three patches are logically independent and could be committed in
> any order. But they touch some overlapping code, so as presented,
> you need to apply that other patch first and then these two in
> sequence.
>

please, can you rebase all three patches necessary for patching?

> The motivation for the first patch is that I noticed that for simple
> plpgsql functions, especially triggers, the per-datum palloc()s performed
> by copy_plpgsql_datum() during function entry amounted to a significant
> fraction of the total runtime. To fix that, the patch simply does one
> palloc for the space needed by all datums, relying on a space calculation
> performed at the end of function compilation by plpgsql_finish_datums().
> This does nothing much for trivial functions with only a few datums, but
> for ones with more, it's a worthwhile savings.
>
> BTW, I experimented with a more drastic solution involving separating
> the "read only" and "writable" parts of PLpgSQL_datum structs and then
> instantiating only the "writable" parts, thus considerably reducing the
> amount of data to be copied during per-call initialization. But that
> was a lot more invasive to the code, and it seemed to be slower than
> what I present here, because performance-critical accesses to variables
> had to compute the addresses of both structs associated with the variable.
> I've not totally given up hope for that idea, but it'll require more
> tuning than I had time for.
>
> In addition to the core idea of the patch, I noticed that there is no
> good reason for PLpgSQL_expr to be treated as a kind of PLpgSQL_datum;
> those structs are never members of the estate->datums[] array, nor is
> there any code expecting them to be structural supersets of PLpgSQL_datum.
> So the patch also removes PLPGSQL_DTYPE_EXPR and the useless fields of
> PLpgSQL_expr.
>
> Also, I changed PLpgSQL_var.isconst and PLpgSQL_var.notnull from "int"
> to "bool", which is what they should have been all along, and relocated
> them in the PLpgSQL_var struct. There are two motivations for this.
> It saves a whole 8 bytes per PLpgSQL_var, at least on 64-bit machines,
> because the fields now fit into what had been wasted padding space;
> reducing the size of what we have to copy during copy_plpgsql_datums
> has to be worth something. Second, those fields are now adjacent to
> the common PLpgSQL_variable fields, which will simplify migrating them
> into PLpgSQL_variable, as I anticipate we'll want to do at some point
> when we allow composite variables to be marked CONSTANT and maybe NOT
> NULL.
>
>
> The idea of the second patch came from noticing that in simple trigger
> functions, quite a large fraction of cycles went into setting up the
> "built in" variables such as tg_name, even though many trigger functions
> probably never read most of those variables. We could improve that by
> not computing the values until/unless they're read. There are various
> names for this technique, but the one that seemed most evocative to me
> was to say that these variables have "promises" attached to them. So
> that's what the patch calls them. We mark the variables with an enum
> indicating which promise needs to be fulfilled for each one, and then
> when about to read a datum, we fulfill the promise if needed.
>
> The method I settled on for that was to invent a separate DTYPE_PROMISE,
> which otherwise is treated exactly like DTYPE_VAR, and to code places
> like exec_eval_datum() like this:
>
> switch (datum->dtype)
> {
> + case PLPGSQL_DTYPE_PROMISE:
> + /* fulfill promise if needed, then handle like regular var */
> + plpgsql_fulfill_promise(estate, (PLpgSQL_var *) datum);
> +
> + /* FALL THRU */
> +
> case PLPGSQL_DTYPE_VAR:
> {
> PLpgSQL_var *var = (PLpgSQL_var *) datum;
>
> The extra DTYPE is a little bit grotty, but it's not awful. One
> alternative I experimented with was to just treat these variables
> as plain DTYPE_VAR, requiring coding like
>
> case PLPGSQL_DTYPE_VAR:
> {
> PLpgSQL_var *var = (PLpgSQL_var *) datum;
>
> + if (unlikely(var->promise != PLPGSQL_PROMISE_NONE))
> + plpgsql_fulfill_promise(estate, var);
> +
> *typeid = var->datatype->typoid;
> *typetypmod = var->datatype->atttypmod;
> *value = var->value;
>
> However, this way is injecting an additional test-and-branch into
> hot code paths, and it was demonstrably slower.
>
> With these patches, I see performance improvements of 10% to 20%
> on simple but not totally unrealistic triggers, for example
>
> create or replace function mytrig() returns trigger language plpgsql as
> $$
> begin
> if (new.f1 != new.f2) or (new.f3 != new.f4) then
> new.f3 = 42;
> end if;
> return new;
> end$$ stable;
>
> (BTW, those are percentages of total INSERT runtime, not just of
> the trigger proper; though I cheated to the extent of using a
> temp not regular table.)
>
> It seems possible that the "promise" technique could be useful for
> other plpgsql special variables in future. I thought briefly about
> applying it to triggers' NEW and OLD arguments, but desisted because
> (a) it's only a win if triggers will commonly not touch the variable,
> which seems unlikely to be true for NEW/OLD; and (b) it would have
> required infrastructure for attaching a promise to a DTYPE_REC
> variable, which was more pain than I wanted. But I wonder if it'd
> be useful for, say, the special variables that exception blocks create.
>
> I'll add this to the January commitfest.
>

All mentioned ideas has sense.

Regards

Pavel

> regards, tom lane
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-01-24 11:28:44 Re: Is it valid to have logical replication between 2 databases on the same postgres server?
Previous Message Michael Paquier 2018-01-24 10:10:40 Re: [HACKERS] Refactoring identifier checks to consistently use strcmp