Re: Manipulating complex types as non-contiguous structures in-memory

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Manipulating complex types as non-contiguous structures in-memory
Date: 2015-05-10 23:06:56
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-10 12:09:41 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Looking at this. First reading the patch to understand the details.
> > * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
> > beneficial, before the compiler could implement the whole thing as a
> > computed goto or lookup table, afterwards not.
> Well, if you're worried about the speed of VARTAG_SIZE() then the right
> thing to do would be to revert your change that made enum vartag_external
> distinct from the size of the struct, so that we could go back to just
> using the second byte of a varattrib_1b_e datum as its size. As I said
> at the time, inserting pad bytes to force each different type of toast
> pointer to be a different size would probably be a better tradeoff than
> what commit 3682025015 did.

I doubt that'd be a net positive. Anyway, all I'm saying is that I can't
see the VARTAG_IS_EXPANDED trick being beneficial in comparison to
checking both explicit values.

> > * You were rather bothered by the potential of multiple evaluations for
> > the ilist stuff. And now the AARR macros are full of them...
> Yeah, there is doubtless some added cost there. But I think it's a better
> answer than duplicating each function in toto; the code space that that
> would take isn't free either.

Yea, duplicating would be horrid. I'm more thinking of declaring some
iterator state outside the macro, or just using an inline function.

> > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
> > buy the argument that turning them into functions will be slower. I'd
> > bet the contrary on common platforms.
> Perhaps; do you want to do some testing and see?

Not exactly with great joy, but I will.

> > * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
> > sad. Don't have a better idea though.
> It's very sad, and it will be high on my list to improve that in 9.6.

> But I do not think it's a fatal problem to ship it that way in 9.5,
> because *as things stand today* those are the only two functions that
> could benefit anyway. It won't really matter until we have extensions
> that want to use this mechanism.

Agreed that it's not fatal.

> > ISTM that the worst case for the new situation is large arrays that
> > exist as plpgsql variables but are only ever passed on.
> Well, more to the point, large arrays that are forced into expanded format
> (costing a conversion step) but then we never do anything with them that
> would benefit from that. Just saying they're "passed on" doesn't prove
> much since the called function might or might not get any benefit.
> array_length doesn't, but some other things would.

Right. But I'm not sure it's that uncommon.

> Your example with array_agg() is interesting, since one of the things on
> my to-do list is to see whether we could change array_agg to return an
> expanded array.

Well, I chose array_agg only because it was a trivial way to generate a
large array. The values could actually come from disk or something.

> It would not be hard to make it build that representation
> directly, instead of its present ad-hoc internal state. The trick would
> be, when can you return the internal state without an additional copy
> step? But maybe it could return a R/O pointer ...

R/O or R/W?

> > ... Expanding only in
> > cases where it'd be beneficial is going to be hard.
> Yeah, improving that heuristic looks like a research project. Still, even
> with all the limitations and to-do items in the patch now, I'm pretty sure
> this will be a net win for practically all applications.

I wonder if we could somehow 'mark' other toast pointers as 'expand if
useful'. I.e. have something pretty much like ExpandedObjectHeader,
except that it initially works like the indirect toast stuff. So
eoh_context is set, but the data is still in the original datum. When
accessed via 'plain' accessors that don't know about the expanded format
the pointed to datum is returned. But when accessed by something
"desiring" the expanded version it's expanded. It seemed that'd be
doable expanding the new infrastructure a bit more.


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-05-10 23:08:04 Re: multixacts woes
Previous Message Kohei KaiGai 2015-05-10 21:34:16 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)