Re: Manipulating complex types as non-contiguous structures in-memory

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-02-17 04:54:34
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is an 0.3 version, rebased over today's HEAD changes (applies to
commit 9e3ad1aac52454569393a947c06be0d301749362 or later), and with some
better logic for transferring expanded array values into and out of plpgsql
functions. Using this example:

create or replace function arraysetnum(n int) returns numeric[] as $$
declare res numeric[] := '{}';
for i in 1 .. n loop
res[i] := i;
end loop;
return res;
$$ language plpgsql strict;

create or replace function arraysumnum(arr numeric[]) returns numeric as $$
declare res numeric := 0;
for i in array_lower(arr, 1) .. array_upper(arr, 1) loop
res := res + arr[i];
end loop;
return res;
$$ language plpgsql strict;

create or replace function arraytimenum(n int) returns numeric as $$
declare tmp numeric[];
tmp := arraysetnum(n);
return arraysumnum(tmp);
$$ language plpgsql strict;

either of the test cases

select arraysumnum(arraysetnum(100000));
select arraytimenum(100000);

involve exactly one coercion from flat to expanded array (during the
initial assignment of the '{}' constant to the "res" variable), no
coercions from expanded to flat, and no bulk copy operations.

So I'm starting to feel fairly good about this. Obviously there's a
nontrivial amount of work to do with integrating the array-code changes
and teaching the rest of the array functions about expanded arrays (or
at least as many of them as seem performance-critical). But that looks
like just a few days of basically-mechanical effort. A larger question
is what we ought to do about extending the array-favoring hacks in plpgsql
to support this type of optimization for non-built-in types.

Realize that what this patch is able to improve are basically two types
of cases:

* nests of function calls that take and return the same complex datatype,
think foo(bar(baz(x))), where x is stored in some flat format but foo()
bar() and baz() all agree on an expanded format that's easier to process.

* plpgsql variables stored in an expanded format that's easier to process
for most functions that might work with their values.

The first case can be implemented by mutual agreement among the functions
of the datatype; it does not need any additional help beyond what's in
this patch. But the second case does not work very well unless plpgsql
takes some proactive step to force variable values into the expanded
format. Otherwise you get a win only if the last assignment to the
variable happened to come from a source that supplied a read-write
expanded value. You can make that happen with appropriate coding in
the plpgsql function, of course, but it's klugy to have to do that.

I would not be ashamed to ship this in 9.5 as just an array optimization
and leave the larger question for next time ... but it does feel a bit
unfinished like this. OTOH, I'm not sure whether the PostGIS folk care
all that much about the intermediate-values-in-plpgsql-variables
scenario. They didn't bring it up in the discussion a year or so back
about their requirements. We do know very well that plpgsql array
variables are a performance pain point, so maybe fixing that is enough
of a goal for 9.5.

(BTW, the nested-function-calls case sure seems like it's dead center
of the wheelhouse for JSONB. Just sayin'. I do not myself have time
to think about applying this technology to JSONB right now, but does
anyone else want to step up?)

regards, tom lane

Attachment Content-Type Size
expanded-arrays-0.3.patch text/x-diff 87.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2015-02-17 06:35:23 Re: Allow "snapshot too old" error, to prevent bloat
Previous Message Kyotaro HORIGUCHI 2015-02-17 03:43:55 Re: Proposal : REINDEX xxx VERBOSE