Re: Detection of nested function calls

From: Hugo Mercier <hugo(dot)mercier(at)oslandia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detection of nested function calls
Date: 2013-10-25 15:07:49
Message-ID: 526A8945.50206@oslandia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 25/10/2013 16:18, Tom Lane a écrit :
> Hugo Mercier <hugo(dot)mercier(at)oslandia(dot)com> writes:
>> PostGIS functions that manipulate geometries have to unserialize their
>> input geometries from the 'flat' varlena representation to their own,
>> and serialize the processed geometries back when returning.
>> But in such nested call queries, this serialization-unserialization
>> process is just an overhead.
>
> This is a reasonable thing to worry about, not just for PostGIS types but
> for many container types such as arrays --- it'd be nice to be able to
> work with an in-memory representation that wasn't just a contiguous blob
> of data. For instance, assignment to an array element might become a
> constant-time operation even when working with variable-length datatypes.
>
>> So we thought having a way for user functions to know if they are part
>> of a nested call could allow them to avoid this serialization phase.
>
> However, this seems like a completely wrong way to go at it. In the first
> place, it wouldn't help for situations like a complex value stored in a
> plpgsql variable. In the second, I don't think that what you are
> describing scales to any more than the most trivial situations. What
> about functions with more than one complex-type input, for example? And
> you'd need to be certain that every single function taking or returning
> the datatype gets updated at exactly the same time, else it'll break.

About plpgsql variables : no there won't be no optimization in that
case. At the time the function result has to be stored in a variable, it
must be serialized.

About functions with more than one complex-type input, as soon as each
parameter are of the same type, there is no problem with that.
But if your function deals with more than one complex type AND you want
to avoid serialization on each parameter, then yes, each type must be
aware of this possible optimization (choose whether to serialize or not).

I don't understand what you mean by "be certain that every single
function ... gets updated at exactly the same time". Could you develop ?

>
> I think the right way to attack it is to create some way for a Datum
> value to indicate, at runtime, whether it's a flat value or an in-memory
> representation. Any given function returning the type could choose to
> return either representation. The datatype would have to provide a way
> to serialize the in-memory representation, when and if it came time to
> store it in a table. To avoid breaking functions that hadn't yet been
> taught about the new representation, we'd probably want to redefine the
> existing DETOAST macros as also invoking this datatype flattening
> function, and then you'd need to use some new access macro if you wanted
> visibility of the non-flat representation. (This assumes that the whole
> thing is only applicable to toastable datatypes, but that seems like a
> reasonable restriction.)

You're totally right. That is very close to what I am working on with
PostGIS.
This is still early work, but for some details :

https://github.com/Oslandia/postgis/blob/nested_ref_passing/postgis/lwgeom_ref.h

Basically, the 'geometry' type of PostGIS is here extended with a flag
saying if the data is actual 'flat' data or a plain pointer. And if this
is a pointer, a type identifier is stored.

And there is a new DETOAST macro (here POSTGIS_DETOAST_DATUM) that will
test if the Datum is a pointer or not and if it is the case, call
corresponding unserializing functions. So you can avoid copies if your
function is aware of that, and the change for existing functions will be
minimum.

https://github.com/Oslandia/postgis/blob/nested_ref_passing/postgis/lwgeom_ref.c

You said "when and if it came time to store it in a table". And, that is
exactly the point of this 'nested' boolean: when do you know that it is
time to store in a table, from a function point of view, otherwise ?

>
> Another thing that would have to be attacked in order to make the
> plpgsql-variable case work is that you'd need some design for copying such
> Datums in-memory, and perhaps a reference count mechanism to optimize away
> unnecessary copies. Your idea of tying the optimization to the nested
> function call scenario would avoid the need to solve this problem, but
> I think it's too narrow a scope to justify all the other work that'd be
> involved.

Do you think it must necessarly cover the plpgsql variable case to be
acceptable ?

--
Hugo Mercier
Oslandia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-25 15:10:41 Re: Detection of nested function calls
Previous Message Tom Lane 2013-10-25 15:01:28 Re: Detection of nested function calls