Re: Manipulating complex types as non-contiguous structures in-memory

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-02-10 21:00:22
Message-ID: 20150210210022.GN3854@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I've now taken this idea as far as building the required infrastructure
> and revamping a couple of array operators to use it. There's a lot yet
> to do, but I've done enough to get some preliminary ideas about
> performance (see below).

Nice!

> * Although I said above that everything owned by a deserialized object
> has to live in a single memory context, I do have ideas about relaxing
> that. The core idea would be to invent a "memory context reset/delete
> callback" feature in mcxt.c. Then a deserialized object could register
> such a callback on its own memory context, and use the callback to clean
> up resources outside its context. This is potentially useful for instance
> for something like PostGIS, where an object likely includes some data that
> was allocated with malloc not palloc because it was created by library
> functions that aren't Postgres-aware. Another likely use-case is for
> deserialized objects representing composite types to maintain reference
> counts on their tuple descriptors instead of having to copy said
> descriptors into their private contexts. This'd be material for a
> separate patch though.

Being able to register a callback to be used on deletion of the context
would certainly be very nice and strikes me as pretty independent of the
rest of this. You've probably thought of this already, but registering
the callback should probably allow the caller to pass in a pointer to be
passed back to the callback function when the delete happens, so that
there's a place for the metadata to be stored about what the callback
function needs to clean up when it's called.

> So that's the plan, and attached is a very-much-WIP patch that uses this
> approach to speed up plpgsql array element assignments (and not a whole
> lot else as yet). Here's the basic test case I've been using:

I've not looked at the code at all as yet, but it makes sense to me.

> With the attached patch, those timings drop to 80 and 150 ms respectively.

And those numbers are pretty fantastic and would address an area we
regularly get dinged on.

> Still, if the worst-case slowdown is around 20% on trivially-sized arrays,
> I'd gladly take that to have better performance on larger arrays. And I
> think this example is close to the worst case for the patch's approach,
> since it's testing small, fixed-element-length, no-nulls arrays, which is
> what the existing code can handle without spending a lot of cycles.

Agreed.

> BTW, I'm not all that thrilled with the "deserialized object" terminology.
> I found myself repeatedly tripping up on which form was serialized and
> which de-. If anyone's got a better naming idea I'm willing to adopt it.

Unfortunately, nothing comes to mind. Serialization is, at least, a
pretty well understood concept and so the naming will likely make sense
to newcomers, even if it's difficult to keep track of which is
serialized and which is deserialized.

> I'm not sure exactly how to push this forward. I would not want to
> commit it without converting a significant number of array functions to
> understand about deserialized inputs, and by the time I've finished that
> work it's likely to be too late for 9.5. OTOH I'm sure that the PostGIS
> folk would love to have this infrastructure in 9.5 not 9.6 so they could
> make a start on fixing their issues. (Further down the pike, I'd plan to
> look at adapting composite-type operations, JSONB, etc, to make use of
> this approach, but that certainly isn't happening for 9.5.)
>
> Thoughts, advice, better ideas?

I'm not really a big fan of putting an infrastructure out there for
modules to use that we don't use ourselves (particularly when it's clear
that there are places where we could/should be). On the other hand,
this doesn't impact on-disk format and therefore I'm less worried that
we'll end up with a release-critical issue when we're getting ready to
put 9.5 out there.

So, I'm on the fence about it. I'd love to see all of this in 9.5 with
the array functions converted, but I don't think it'd be horrible if
only a subset had been done in time for 9.5. The others aren't going to
go anywhere and will still work. I do think it'd be better to have at
least some core users of this new infrastructure rather than just
putting it out there for modules to use but I agree it'd be a bit grotty
to have only some of the array functions converted.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-10 22:15:40 Show the LSN in rm_redo_error_callback
Previous Message Andres Freund 2015-02-10 20:56:53 Re: Parallel Seq Scan