Re: [HACKERS] [PATCH] Generic type subscripting

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Date: 2018-01-22 22:38:23
Message-ID: CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Sorry for late reply. I've attached a new version of the patch with following
changes:

> On 7 January 2018 at 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Part of the reason why I'm insistent on that is that I think it will
> expose that the division of labor between the core parser and the
> datatype-specific parse function is still a mess.

I moved all logic that busy with determining required input data type into the
datatype specific code (except everything domain related). Unfortunately, we
need some of those types before the custom code can perform validation - to
address this `parse` function now splitted into two phases, initial preparation
(when the custom code can provide all required information), and actual
validation (when everything, e.g. rhs is processed). In general I hope this
approach makes separation of concerns more clear.

> On the executor side of things, I suspect Andres will be unhappy that
> you are making ExprEvalStep part of the API for datatypes --- he
> objected to my exposing it to plpgsql param eval in
> https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de
> and there was a lot more reason to do so there than there is here, IMO.

I've changes fetch/assign functions so that ExprEvalStep isn't exposed anymore.

> Now on the other hand, maybe the right way to go is to embrace a similar
> approach to what I did for plpgsql param eval, and let the datatype
> control what gets generated as the expression execution step. The main
> point here would be to let the datatype provide the address of a callback
> function that gets executed for a subscripting step, rather than having it
> specify the OID of a pg_proc entry to call. There would be two big wins
> from that:
>
> * The callback function would have a plain C call signature, so we would
> not have to go through FunctionCallN, saving a few cycles. This is
> attractive because it would pretty much eliminate any concern about this
> patch making array access slower at execution time.
>
> * There would no longer be a wired-in restriction that there be two and
> only two subscripting execution functions per datatype, since there would
> not be any need for those functions to be identified in pg_type.
>
> The two disadvantages I can see of approaching things this way are:
>
> * There'd be at least some connection of subscriptable types to
> expression compilation, which is what Andres was objecting to in the
> message I cited above. Possibly we could alleviate that a bit by
> providing helper functions that mask exactly what goes into the
> expression step structs, but I'm not sure that that gets us far.
>
> * We'd not have OIDs of execution functions in the parse trees for
> subscripting operations, which would mean that we would not have
> a convenient way to identify subscripting operations that are
> mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume
> that subscripting is always immutable and parallel-safe, although
> I'm slightly more concerned about whether people would want the
> option to label it leaky vs leakproof. As against that, the approach
> that's there right now adds planning overhead that wasn't there before
> for exactly those function property lookups, and again I'm a bit worried
> about the performance impact. (I did some crude performance tests
> today that indicated that the existing patch has small but measurable
> penalties, maybe on the order of 10 percent; and it'd be more by the
> time we're done because I'm pretty sure you've missed some places that
> ought to check these function properties if we're going to have them.
> So I'm afraid that we'll get pushback from people who don't care about
> extensible subscripts and do care about array performance.)

I tired to apply this approach with callback functions for fetch/assign logic
(these functions I mentioned above for prepare/validate are also implemented
like that). This part is actually can be done more or less independently from
changes above. Is it close to what you suggesting?

As a side note, I'm not sure why this main function should have a signature
with opcode:

subscript_support(int opcode, internal other_info) returns internal

since instead of calling it with different opcode we can just return a set of
callback and execute what's necessary at this particular point.

I haven't evaluated performance of this implementation yet, will do that soon.
But in the meantime I want to align on what can be accepted as the best solution
here.

> Yeah, I'm beginning to wonder if we should do the renaming at all.
> It's useful for being sure we've found everyplace that needs to change
> ... but if lots of those places don't actually need more than the
> name changes, maybe it's just make-work and code thrashing.

I'm strongly in favor of renaming, just because I don't feel comfortable
when a name of a concept is not exactly deliver the meaning. In the current
version I kept the name "container" so far due lack of feasible alternatives.

> While I'm on the topic, I am not really happy with s/array/container/
> as you've done in some of this code. To my mind, "container type"
> includes composite types. Particularly in the parse_target code, where
> we're currently dealing with either composites or arrays, making it say
> that we're dealing with either composites or containers is just a recipe
> for confusion. Unfortunately I can't think of a better word offhand,
> but some attention to this is needed. As far as the comments go,
> we might be able to use the term "subscriptable type", but not sure if
> that will work for function/variable names.
>
> After further thought, I think I'm prepared to say (for the moment) that
> only true arrays need be deemed to be containers in this sense. If you
> make a subscripting function for anything else, we'll treat it as just a
> function that happens to yield the result type but doesn't imply that that
> is what is physically stored. Perhaps at some point that will need to
> change, but I'm failing to think of near-term use cases where it would be
> important to have such a property.
>
> This is, however, a good reason why I don't like the use of "container"
> terminology in the patch. I think we want to reserve "container" for
> types where physical containment is assumed.

I see the point. But to my understanding all other datatypes are "containers"
too, since a subscripting function most likely will return some data from it,
maybe in some transformed form. Arrays are just more strict containers, so
maybe reserve "typed/strict_container" for them?

One more note. In the current version of the patch I haven't updated a tutorial
part, since as I said I want to discuss and have an agreement on details stated
above. Also, this idea about separating renaming stuff from everything else is
actually paid off - I found out few more places where I forgot to revert some
remnants of the implementation with two separated node types. I'm going to fix
them within few days in the next version.

Attachment Content-Type Size
0001-Renaming-for-new-subscripting-mechanism-v5.patch application/octet-stream 47.3 KB
0002-Base-implementation-of-subscripting-mechanism-v5.patch application/octet-stream 126.9 KB
0003-Subscripting-for-array-v5.patch application/octet-stream 12.8 KB
0004-Subscripting-for-jsonb-v5.patch application/octet-stream 32.9 KB
0005-Subscripting-documentation-v5.patch application/octet-stream 18.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-01-22 22:41:06 Re: jsonpath
Previous Message Petr Jelinek 2018-01-22 22:18:20 Re: [PATCH] Logical decoding of TRUNCATE