Re: Detection of nested function calls

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hugo Mercier <hugo(dot)mercier(at)oslandia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detection of nested function calls
Date: 2013-10-25 14:18:27
Message-ID: 2355.1382710707@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hugo Mercier <hugo(dot)mercier(at)oslandia(dot)com> writes:
> PostGIS functions that manipulate geometries have to unserialize their
> input geometries from the 'flat' varlena representation to their own,
> and serialize the processed geometries back when returning.
> But in such nested call queries, this serialization-unserialization
> process is just an overhead.

This is a reasonable thing to worry about, not just for PostGIS types but
for many container types such as arrays --- it'd be nice to be able to
work with an in-memory representation that wasn't just a contiguous blob
of data. For instance, assignment to an array element might become a
constant-time operation even when working with variable-length datatypes.

> So we thought having a way for user functions to know if they are part
> of a nested call could allow them to avoid this serialization phase.

However, this seems like a completely wrong way to go at it. In the first
place, it wouldn't help for situations like a complex value stored in a
plpgsql variable. In the second, I don't think that what you are
describing scales to any more than the most trivial situations. What
about functions with more than one complex-type input, for example? And
you'd need to be certain that every single function taking or returning
the datatype gets updated at exactly the same time, else it'll break.

I think the right way to attack it is to create some way for a Datum
value to indicate, at runtime, whether it's a flat value or an in-memory
representation. Any given function returning the type could choose to
return either representation. The datatype would have to provide a way
to serialize the in-memory representation, when and if it came time to
store it in a table. To avoid breaking functions that hadn't yet been
taught about the new representation, we'd probably want to redefine the
existing DETOAST macros as also invoking this datatype flattening
function, and then you'd need to use some new access macro if you wanted
visibility of the non-flat representation. (This assumes that the whole
thing is only applicable to toastable datatypes, but that seems like a
reasonable restriction.)

Another thing that would have to be attacked in order to make the
plpgsql-variable case work is that you'd need some design for copying such
Datums in-memory, and perhaps a reference count mechanism to optimize away
unnecessary copies. Your idea of tying the optimization to the nested
function call scenario would avoid the need to solve this problem, but
I think it's too narrow a scope to justify all the other work that'd be
involved.

Some colleagues of mine at Salesforce have been playing with ideas like
this, though last I heard they were nowhere near having a submittable
patch.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-10-25 14:34:34 Re: New committer
Previous Message Andres Freund 2013-10-25 14:11:22 Re: RULE regression test fragility?