Rethinking the parameter access hooks for plpgsql's benefit

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Rethinking the parameter access hooks for plpgsql's benefit
Date: 2015-03-09 03:37:34
Message-ID: 4146.1425872254@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We already knew that plpgsql's setup_param_list() is a bit of a
performance hog. Trying to improve that was the primary motivation
behind commit f4e031c662a6b600b786c4849968a099c58fcce7 (the
bms_next_member business), though that was just micro-optimization
rather than any fundamental rethink of how things are done.

Some benchmarking at Salesforce has shown that there's a really
significant penalty in plpgsql functions that have a lot of local
variables, because large "ndatums" increases the size of the
ParamListInfo array that has to be allocated (and zeroed!) for
each expression invocation.

Thinking about this, it struck me that the real problem is that
the parameter list APIs are still too stuck in the last century,
in that they're optimized for static lists of parameters which
is not what plpgsql needs at all. In fact, if we redefined the
usage of ParamFetchHook so that it would be called on every parameter
fetch, plpgsql could be perfectly happy with just *one* ParamListInfo
slot that it would re-fill each time. This would entirely eliminate
the need for any per-expression-execution ParamListInfo allocation,
because a function could just use one single-entry ParamListInfo array
for all purposes.

The attached proposed patch represents an exploration of this idea.
I've also attached a SQL script with some simple functions for testing
its performance. The first one is for exploring what happens with more
or fewer local variables. I compared current HEAD against the patch
with asserts off, for a call like "SELECT timingcheck(10000000);".
It looks like this (times in ms):

HEAD patch

no extra variables 8695 8390
10 extra variables 9218 8419
30 extra variables 9424 8255
100 extra variables 9433 8202

These times are only repeatable to within a few percent, but they do show
that there is a gain rather than loss of performance even in the base
case, and that the patched code's performance is pretty much independent
of the number of local variables whereas HEAD isn't.

The case where the patch potentially loses is where a single expression
contains multiple references to the same plpgsql variable, since with
HEAD, additional references to a variable don't incur any extra trips
through the ParamFetchHook, whereas with the patch they do. I set up
the reusecheck10() and reusecheck5() functions to see how bad that is.

HEAD patch

5 uses of same variable 4105 3962
10 uses of same variable 5738 6076

So somewhere between 5 and 10 uses of the same variable in a single
expression, you start to lose.

I claim that that's a very unlikely scenario and so this patch should
be an unconditional win for just about everybody.

The patch does create an API break for any code that is using
ParamFetchHooks, but it's an easy fix (just return the address of
the array element you stored data into), and any decent C compiler
will complain about the function type mismatch so the API break
should not go unnoticed.

Objections? Even better ideas?

regards, tom lane

Attachment Content-Type Size
rethink-paramfetchhook-definition.patch text/x-diff 20.4 KB
simple-speed-tests.sql text/x-sql 2.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2015-03-09 03:49:08 Re: Join push-down support for foreign tables
Previous Message Andrew Dunstan 2015-03-09 02:33:18 Re: Bootstrap DATA is a pita