Manipulating complex types as non-contiguous structures in-memory

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-02-10 20:00:35
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been fooling around with a design to support computation-oriented,
not-necessarily-contiguous-blobs representations of datatypes in memory,
along the lines I mentioned here:

In particular this is meant to reduce the overhead for repeated operations
on arrays, records, etc. We've had several previous discussions about
that, and even some single-purpose patches such as in this thread:
There was also a thread discussing how this sort of thing could be
useful to PostGIS:
and it's been discussed a few other times too, but I'm too lazy to
search the archives any further.

I've now taken this idea as far as building the required infrastructure
and revamping a couple of array operators to use it. There's a lot yet
to do, but I've done enough to get some preliminary ideas about
performance (see below).

The core ideas of this patch are:

* Invent a new TOAST datum type "pointer to deserialized object", which
is physically just like the existing indirect-toast-pointer concept, but
it has a different va_tag code and somewhat different semantics.

* A deserialized object has a standard header (which is what the toast
pointers point to) and typically will have additional data-type-specific
fields after that. One component of the standard header is a pointer to
a set of "method" functions that provide ways to accomplish standard
data-type-independent operations on the deserialized object.

* Another standard header component is a MemoryContext identifier: the
header, as well as all subsidiary data belonging to the deserialized
object, must live in this context. (Well, I guess there could also be
child contexts.) By exposing an explicit context identifier, we can
accomplish tasks like "move this object into another context" by
reparenting the object's context rather than physically copying anything.

* The only standard "methods" I've found a need for so far are functions
to re-serialize the object, that is generate a plain varlena value that is
semantically equivalent. To avoid extra copying, this is split into
separate "compute the space needed" and "serialize into this memory"
steps, so that the result can be dropped exactly where the caller needs

* Currently, a deserialized object will be reserialized in that way
whenever we incorporate it into a physical tuple (ie, heap_form_tuple
or index_form_tuple), or whenever somebody applies datumCopy() to it.
I'd like to relax this later, but there's an awful lot of code that
supposes that heap_form_tuple or datumCopy will produce a self-contained
value that survives beyond, eg, destruction of the memory context that
contained the source Datums. We can get good speedups in a lot of
interesting cases without solving that problem, so I don't feel too bad
about leaving it as a future project.

* In particular, things like PG_GETARG_ARRAYTYPE_P() treat a deserialized
toast pointer as something to be detoasted, and will produce a palloc'd
re-serialized value. This means that we do not need to convert all the
C functions concerned with a given datatype at the same time (or indeed
ever); a function that hasn't been upgraded will build a re-serialized
representation and then operate on that. We'll invent alternate
argument-fetching functions that skip the reserialization step, for use
by functions that have been upgraded to handle either case. This is
basically the same approach we used when we introduced short varlena
headers, and that seems to have gone smoothly enough.

* There's a concept that a deserialized object has a "primary" toast
pointer, which is physically part of the object, as well as "secondary"
toast pointers which might or might not be part of the object. If you
have a Datum pointer to the primary toast pointer then you are authorized
to modify the object in-place; if you have a Datum pointer to a secondary
toast pointer then you must treat it as read-only (ie, you have to make a
copy if you're going to change it). Functions that construct a new
deserialized object always return its primary toast pointer; this allows a
nest of functions to modify an object in-place without copying, which was
the primary need that the PostGIS folks expressed. On the other hand,
plpgsql can hand out secondary toast pointers to deserialized objects
stored in plpgsql function variables, thus ensuring that the objects won't
be modified unexpectedly, while never having to physically copy them if
the called functions just need to inspect them.

* Primary and secondary pointers are physically identical, but the
primary pointer resides in a specific spot in the deserialized object's
standard header. (So you can tell if you've got the primary pointer via
a simple address comparison.)

* I've modified the array element assignment path in plpgsql's
exec_assign_value so that, instead of passing a secondary toast pointer
to array_set() as you might expect from the above, it passes the primary
toast pointer thus allowing array_set() to modify the variable in-place.
So an operation like "array_variable[x] := y" no longer incurs recopying
of the whole array, once the variable has been converted into deserialized
form. (If it's not yet, it becomes deserialized after the first such
assignment.) Also, assignment of an already-deserialized value to a
variable accomplishes that with a MemoryContext parent pointer swing
instead of physical copying, if what we have is the primary toast pointer,
which implies it's not referenced anywhere else.

* Any functions that plpgsql gives a read/write pointer to need to be
exceedingly careful to not leave a corrupted object behind if they fail
partway through. I've successfully written such a version of array_set(),
and it wasn't too painful, but this may be a limitation on the general
applicability of the whole approach.

* In the current patch, that restriction only applies to array_set()
anyway. But I would like to allow in-place updates for non-core cases.
For example in something like
hstore_var := hstore_var || 'foo=>bar';
we could plausibly pass a R/W pointer to hstore_concat and let it modify
hstore_var in place. But this would require knowing which such functions
are safe, or assuming that they all are, which might be an onerous

* I soon noticed that I was getting a lot of empty "deserialized array"
contexts floating around. The attached patch addresses this in a quick
hack fashion by redefining ResetExprContext() to use
MemoryContextResetAndDeleteChildren() instead of MemoryContextReset(),
so that deserialized objects created within an expression evaluation
context go completely away at ResetExprContext(), rather than being left
behind as empty subcontext shells. We've talked more than once about
redefining mcxt.c's API so that MemoryContextReset() means what's
currently meant by MemoryContextResetAndDeleteChildren(), and if you
really truly do want to keep empty child contexts around then you need to
call something else instead. I did not go that far here, but I think we
should seriously consider biting the bullet and finally changing it.

* Although I said above that everything owned by a deserialized object
has to live in a single memory context, I do have ideas about relaxing
that. The core idea would be to invent a "memory context reset/delete
callback" feature in mcxt.c. Then a deserialized object could register
such a callback on its own memory context, and use the callback to clean
up resources outside its context. This is potentially useful for instance
for something like PostGIS, where an object likely includes some data that
was allocated with malloc not palloc because it was created by library
functions that aren't Postgres-aware. Another likely use-case is for
deserialized objects representing composite types to maintain reference
counts on their tuple descriptors instead of having to copy said
descriptors into their private contexts. This'd be material for a
separate patch though.

So that's the plan, and attached is a very-much-WIP patch that uses this
approach to speed up plpgsql array element assignments (and not a whole
lot else as yet). Here's the basic test case I've been using:

create or replace function arraysetint(n int) returns int[] as $$
declare res int[] := '{}';
for i in 1 .. n loop
res[i] := i;
end loop;
return res;
$$ language plpgsql strict;

In HEAD, this function's runtime grows as O(N^2), so for example
(with casserts off on my x86_64 workstation):

regression=# select array_dims(arraysetint(100000));
(1 row)

Time: 7874.070 ms

With variable-length array elements, such as if you change the
int[] arrays to numeric[], it's even worse:

regression=# select array_dims(arraysetnum(100000));
(1 row)

Time: 31177.340 ms

With the attached patch, those timings drop to 80 and 150 ms respectively.

It's not all peaches and cream: for the array_append operator, which is
also accelerated by the patch (mainly because it is too much in bed with
array_set to not fix at the same time ;-)), I tried examples like

explain analyze select array[1,2] || g || g || g from generate_series(1,1000000) g;

Very roughly, HEAD needs about 400 ns per || operator in this scenario.
With the patch, it's about 480 ns for the first operator and then 200 more
for each one accepting a prior operator's output. (Those numbers could
perhaps be improved with more-invasive refactoring of the array code.)
The extra initial overhead represents the time to convert the array[1,2]
constant to deserialized form during each execution of the first operator.

Still, if the worst-case slowdown is around 20% on trivially-sized arrays,
I'd gladly take that to have better performance on larger arrays. And I
think this example is close to the worst case for the patch's approach,
since it's testing small, fixed-element-length, no-nulls arrays, which is
what the existing code can handle without spending a lot of cycles.

Note that I've kept all the deserialized-array-specific code in its own file
for now, just for ease of hacking. That stuff would need to propagate into
the main array-related files in a more complete patch.

BTW, I'm not all that thrilled with the "deserialized object" terminology.
I found myself repeatedly tripping up on which form was serialized and
which de-. If anyone's got a better naming idea I'm willing to adopt it.

I'm not sure exactly how to push this forward. I would not want to
commit it without converting a significant number of array functions to
understand about deserialized inputs, and by the time I've finished that
work it's likely to be too late for 9.5. OTOH I'm sure that the PostGIS
folk would love to have this infrastructure in 9.5 not 9.6 so they could
make a start on fixing their issues. (Further down the pike, I'd plan to
look at adapting composite-type operations, JSONB, etc, to make use of
this approach, but that certainly isn't happening for 9.5.)

Thoughts, advice, better ideas?

regards, tom lane

Attachment Content-Type Size
deserialized-arrays-0.1.patch text/x-diff 73.6 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2015-02-10 20:04:42 Re: RangeType internal use
Previous Message Stephen Frost 2015-02-10 19:06:43 Re: Odd behavior of updatable security barrier views on foreign tables