Re: Manipulating complex types as non-contiguous structures in-memory

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-05-10 16:09:41
Message-ID: 12295.1431274181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Looking at this. First reading the patch to understand the details.

> * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
> beneficial, before the compiler could implement the whole thing as a
> computed goto or lookup table, afterwards not.

Well, if you're worried about the speed of VARTAG_SIZE() then the right
thing to do would be to revert your change that made enum vartag_external
distinct from the size of the struct, so that we could go back to just
using the second byte of a varattrib_1b_e datum as its size. As I said
at the time, inserting pad bytes to force each different type of toast
pointer to be a different size would probably be a better tradeoff than
what commit 3682025015 did.

> * It'd be nice if the get_flat_size comment in expandeddatm.h could
> specify whether the header size is included. That varies enough around
> toast that it seems worthwhile.

OK.

> * You were rather bothered by the potential of multiple evaluations for
> the ilist stuff. And now the AARR macros are full of them...

Yeah, there is doubtless some added cost there. But I think it's a better
answer than duplicating each function in toto; the code space that that
would take isn't free either.

> * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
> buy the argument that turning them into functions will be slower. I'd
> bet the contrary on common platforms.

Perhaps; do you want to do some testing and see?

> * Not a fan of the EH_ prefix in array_expanded.c and EOH_
> elsewhere. Just looks ugly to me. Whatever.

I'm not wedded to that naming if you have a better suggestion.

> * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
> sad. Don't have a better idea though.

It's very sad, and it will be high on my list to improve that in 9.6.
But I do not think it's a fatal problem to ship it that way in 9.5,
because *as things stand today* those are the only two functions that
could benefit anyway. It won't really matter until we have extensions
that want to use this mechanism.

> * "Also, a C function that is modifying a read-write expanded value
> in-place should take care to leave the value in a sane state if it
> fails partway through." - that's a pretty hefty requirement imo.

It is, which is one reason that I'm nervous about relaxing the test in
exec_check_rw_parameter. Still, it was possible to code array_set_element
to meet that restriction without too much pain. I'm inclined to leave the
stronger requirement in the docs for now, until we get more push-back.

> * The forced RW->RO conversion in subquery scans is a bit sad, but I
> seems like something left for later.

Yes, there are definitely some things that look like future opportunities
here.

> Somewhere in the thread you comment on the fact that it's a bit sad that
> plpgsql is the sole benefactor of this (unless some function forces
> expansion internally). I'd be ok to leave it at that for now. It'd be
> quite cool to get some feedback from postgis folks about the suitability
> of this for their cases.

Indeed, that's the main reason I'm eager to ship something in 9.5, even if
it's not perfect. I hope those guys will look at it and use it, and maybe
give us feedback on ways to improve it.

> ISTM that the worst case for the new situation is large arrays that
> exist as plpgsql variables but are only ever passed on.

Well, more to the point, large arrays that are forced into expanded format
(costing a conversion step) but then we never do anything with them that
would benefit from that. Just saying they're "passed on" doesn't prove
much since the called function might or might not get any benefit.
array_length doesn't, but some other things would.

Your example with array_agg() is interesting, since one of the things on
my to-do list is to see whether we could change array_agg to return an
expanded array. It would not be hard to make it build that representation
directly, instead of its present ad-hoc internal state. The trick would
be, when can you return the internal state without an additional copy
step? But maybe it could return a R/O pointer ...

> ... Expanding only in
> cases where it'd be beneficial is going to be hard.

Yeah, improving that heuristic looks like a research project. Still, even
with all the limitations and to-do items in the patch now, I'm pretty sure
this will be a net win for practically all applications.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message José Luis Tallón 2015-05-10 16:41:08 Re: multixacts woes
Previous Message Tom Lane 2015-05-10 15:42:13 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)