Re: Manipulating complex types as non-contiguous structures in-memory

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-05-05 22:50:13
Message-ID: 29085.1430866213@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> Significant slowdown is on following test:

>> do $$ declare a int[] := '{}'; begin for i in 1..90000 loop a := a || 10;
>> end loop; end$$ language plpgsql;
>> do $$ declare a numeric[] := '{}'; begin for i in 1..90000 loop a := a ||
>> 10.1; end loop; end$$ language plpgsql;

>> integer master 14sec x patched 55sec
>> numeric master 43sec x patched 108sec

>> It is probably worst case - and it is known plpgsql antipattern

> Yeah, I have not expended a great deal of effort on the array_append/
> array_prepend/array_cat code paths. Still, in these plpgsql cases,
> we should in principle have gotten down from two array copies per loop to
> one, so it's disappointing to not have better results there, even granting
> that the new "copy" step is not just a byte-by-byte copy. Let me see if
> there's anything simple to be done about that.

The attached updated patch reduces both of those do-loop tests to about
60 msec on my machine. It contains two improvements over the 1.1 patch:

1. There's a fast path for copying an expanded array to another expanded
array when the element type is pass-by-value: we can just memcpy the
Datum array instead of working element-by-element. In isolation, that
change made the patch a little faster than 9.4 on your int-array case,
though of course it doesn't help for the numeric-array case (and I do not
see a way to avoid working element-by-element for pass-by-ref cases).

2. pl/pgsql now detects cases like "a := a || x" and allows the array "a"
to be passed as a read-write pointer to array_append, so that array_append
can modify expanded arrays in-place and avoid inessential data copying
altogether. (The earlier patch had made array_append and array_prepend
safe for this usage, but there wasn't actually any way to invoke them
with read-write pointers.) I had speculated about doing this in my
earliest discussion of this patch, but there was no code for it before.

The key question for change #2 is how do we identify what is a "safe"
top-level function that can be trusted not to corrupt the read-write value
if it fails partway through. I did not have a good answer before, and
I still don't; what this version of the patch does is to hard-wire
array_append and array_prepend as the functions considered safe.
Obviously that is crying out for improvement, but we can leave that
question for later; at least now we have infrastructure that makes it
possible to do it.

Change #1 is actually not relevant to these example cases, because we
don't copy any arrays within the loop given change #2. But I left it in
because it's not much code and it will help for situations where change #2
doesn't apply.

regards, tom lane

Attachment Content-Type Size
expanded-arrays-1.2.patch text/x-diff 159.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2015-05-05 22:53:41 Re: BRIN range operator class
Previous Message Peter Geoghegan 2015-05-05 22:00:56 Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0