Re: [HACKERS] [PATCH] Generic type subscripting

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Date: 2018-01-08 17:23:14
Message-ID: CA+q6zcX5bm=WP3g+g17LnfTn_Wk31rZasNvuwuPp2JXTzOUwuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 7 January 2018 at 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Dmitry Dolgov <9erthalion6(at)gmail(dot)com> writes:
>>> On 4 January 2018 at 03:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I wonder what happened to the plan to separate assignment and fetch into two
>>> different node types. I can see that that didn't happen so far as primnodes.h
>>> is concerned, but you've made some changes that seem to assume it did happen.
>
>> There was one version of this patch that followed this plan. It turns out that
>> it's quite unnatural approach (at least within the current implementation),
>> because I had to duplicate or reuse a lot of code for those two node types.
>
> I'm not entirely convinced by that argument. Sure, there would be a lot of
> duplicative code in the node support functions, but that's inherent in our
> approach to those functions. I'm more concerned about readability,
> especially given that exposing this feature to extensions is going to set
> all these decisions in concrete forever.

That version I'm talking about actually have got not that much readability from
this separation (at least from my perspective). For extenions it maybe more
justified, but at the same time `isAssignment` flag the only thing you need to
check to figure out whether it's a fetch or assignment operation - it doesn't
sound as something significantly worse for readability. Or do you mean
something else?

Anyway, it's quite possible that I just failed to implement a proper approach
to implement this. I'm going to keep pondering this in case I'll come up with
something better - it should be more or less straightforward to change this
logic at any time.

> * You changed transformArrayType so that it doesn't throw an error if
> the given type isn't an array --- without, I note, updating either the
> function header comment or the internal comment that you falsified.
>
> * That in turn means that where transformAssignmentSubscripts thinks
> it's determining the result type, it may or may not be producing
> InvalidOid instead (again, with no comment to warn the reader).
>
> * And then you had to kludge transformAssignmentIndirection horribly
> (and I'm not sure you kludged it enough, either, because there are
> still a bunch of places where it uses targetTypeId without any concern
> for the possibility that that's zero). It doesn't seem to me to be
> acceptable to just ignore coercion failure, as that code does now.
> If it's no longer the responsibility of this function to guarantee
> that the result is of the right type, why is it attempting coercion
> at all? In any case you failed to update its header comment to
> explain what it's doing differently than before.
>
> In short the division of labor in this area still needs to be thought
> about. I don't think we really want transformAssignmentSubscripts
> determining the required input data type at all; that should be
> farmed out to the datatype-specific code. I'm also pretty unconvinced
> about refnestedfunc --- why do we need that?

Yes, I see, there is a room for improvements. I'm going to post a new version
soon, where I'll try to address this. About `refnestedfunc` - I added it as
part of my modification for functions caching, but looks like now this function
is useless.

> While I'm on the topic, I am not really happy with s/array/container/
> as you've done in some of this code. To my mind, "container type"
> includes composite types. Particularly in the parse_target code, where
> we're currently dealing with either composites or arrays, making it say
> that we're dealing with either composites or containers is just a recipe
> for confusion. Unfortunately I can't think of a better word offhand,
> but some attention to this is needed. As far as the comments go,
> we might be able to use the term "subscriptable type", but not sure if
> that will work for function/variable names.

Is there a plausible use case when "container type" can be something else than
a "composite type"? Maybe it's ok just to say that "container type" is equal to
"composite type"?

> On the executor side of things, I suspect Andres will be unhappy that
> you are making ExprEvalStep part of the API for datatypes --- he
> objected to my exposing it to plpgsql param eval in
> https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de
> and there was a lot more reason to do so there than there is here, IMO.
> It looks like what you actually need is just the SubscriptingRefState and
> an isnull flag, so it might be better to specify that the fetch and assign
> functions have signatures like
> Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
> (representing both of the last two arguments as INTERNAL at SQL level).

Yes, I agree.

> Now on the other hand, maybe the right way to go is to embrace a similar
> approach to what I did for plpgsql param eval, and let the datatype
> control what gets generated as the expression execution step. The main
> point here would be to let the datatype provide the address of a callback
> function that gets executed for a subscripting step, rather than having it
> specify the OID of a pg_proc entry to call.

I had one implementation that resembles this approach. But as far as I see
(please correct me if something in the following chain is wrong) it's necessary
to store pointers to these callback functions in a `SubscriptingRef` node,
which means that they would be missing in functions from
`readfuncs`/`outfuncs`/`copyfuncs`. Is there any situation when this can lead
to undesirable consequences?

> * We'd not have OIDs of execution functions in the parse trees for
> subscripting operations, which would mean that we would not have
> a convenient way to identify subscripting operations that are
> mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume
> that subscripting is always immutable and parallel-safe, although
> I'm slightly more concerned about whether people would want the
> option to label it leaky vs leakproof. As against that, the approach
> that's there right now adds planning overhead that wasn't there before
> for exactly those function property lookups, and again I'm a bit worried
> about the performance impact. (I did some crude performance tests
> today that indicated that the existing patch has small but measurable
> penalties, maybe on the order of 10 percent; and it'd be more by the
> time we're done because I'm pretty sure you've missed some places that
> ought to check these function properties if we're going to have them.
> So I'm afraid that we'll get pushback from people who don't care about
> extensible subscripts and do care about array performance.)

We also wouldn't have a convenient way to specify costs of subscripting
operations. I had in mind a situation, when someone wants to have a quite heavy
subscripting function, which should affect planner's decisions. It maybe a rare
case though.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Catalin Iacob 2018-01-08 17:24:21 Re: Doc tweak for huge_pages?
Previous Message Shubham Barai 2018-01-08 17:14:37 Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index