Re: [HACKERS] [PATCH] Generic type subscripting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Date: 2018-01-07 22:39:00
Message-ID: 5571.1515364740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dmitry Dolgov <9erthalion6(at)gmail(dot)com> writes:
>> On 4 January 2018 at 03:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I wonder what happened to the plan to separate assignment and fetch into two
>> different node types. I can see that that didn't happen so far as primnodes.h
>> is concerned, but you've made some changes that seem to assume it did happen.

> There was one version of this patch that followed this plan. It turns out that
> it's quite unnatural approach (at least within the current implementation),
> because I had to duplicate or reuse a lot of code for those two node types.

I'm not entirely convinced by that argument. Sure, there would be a lot of
duplicative code in the node support functions, but that's inherent in our
approach to those functions. I'm more concerned about readability,
especially given that exposing this feature to extensions is going to set
all these decisions in concrete forever.

> Here is a new rebased version of the patch
> with incorporated improvements that you've mentioned.

I spent a couple of hours looking through this. I'm still pretty unhappy
with the state of the parser APIs. In the first place, you haven't
documented what those APIs are in any meaningful way. I do not think it's
adequate to tell people to go read array_subscript_parse as the first and
only way to understand what a subscript parse function must do. We do
often expect extension authors to pick up small details that way, but
there should be a textual API spec of some sort --- for example, look at
the index AM API specs in indexam.sgml, which give pretty clear high-level
definitions of what each AM API function has to do.

Part of the reason why I'm insistent on that is that I think it will
expose that the division of labor between the core parser and the
datatype-specific parse function is still a mess. One particular sore
spot is the question of who decides what the return data type of a
subscripting function is. Right now you seem to be making that decision
in the core parser, at least for the assignment case, which is pretty
bad from an extensibility standpoint and also leads to all of this
messiness:

* You changed transformArrayType so that it doesn't throw an error if
the given type isn't an array --- without, I note, updating either the
function header comment or the internal comment that you falsified.

* That in turn means that where transformAssignmentSubscripts thinks
it's determining the result type, it may or may not be producing
InvalidOid instead (again, with no comment to warn the reader).

* And then you had to kludge transformAssignmentIndirection horribly
(and I'm not sure you kludged it enough, either, because there are
still a bunch of places where it uses targetTypeId without any concern
for the possibility that that's zero). It doesn't seem to me to be
acceptable to just ignore coercion failure, as that code does now.
If it's no longer the responsibility of this function to guarantee
that the result is of the right type, why is it attempting coercion
at all? In any case you failed to update its header comment to
explain what it's doing differently than before.

In short the division of labor in this area still needs to be thought
about. I don't think we really want transformAssignmentSubscripts
determining the required input data type at all; that should be
farmed out to the datatype-specific code. I'm also pretty unconvinced
about refnestedfunc --- why do we need that?

I also notice that you still haven't done anything about the problem
of the subscripting operation possibly yielding a different typmod
or collation than the container type has. It was okay to let that
go for awhile, but we're not shipping it like this, because it's
going to be awfully hard to change that struct type once extensions
are building them.

While I'm on the topic, I am not really happy with s/array/container/
as you've done in some of this code. To my mind, "container type"
includes composite types. Particularly in the parse_target code, where
we're currently dealing with either composites or arrays, making it say
that we're dealing with either composites or containers is just a recipe
for confusion. Unfortunately I can't think of a better word offhand,
but some attention to this is needed. As far as the comments go,
we might be able to use the term "subscriptable type", but not sure if
that will work for function/variable names.

On the executor side of things, I suspect Andres will be unhappy that
you are making ExprEvalStep part of the API for datatypes --- he
objected to my exposing it to plpgsql param eval in
https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de
and there was a lot more reason to do so there than there is here, IMO.
It looks like what you actually need is just the SubscriptingRefState and
an isnull flag, so it might be better to specify that the fetch and assign
functions have signatures like
Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
(representing both of the last two arguments as INTERNAL at SQL level).

Now on the other hand, maybe the right way to go is to embrace a similar
approach to what I did for plpgsql param eval, and let the datatype
control what gets generated as the expression execution step. The main
point here would be to let the datatype provide the address of a callback
function that gets executed for a subscripting step, rather than having it
specify the OID of a pg_proc entry to call. There would be two big wins
from that:

* The callback function would have a plain C call signature, so we would
not have to go through FunctionCallN, saving a few cycles. This is
attractive because it would pretty much eliminate any concern about this
patch making array access slower at execution time.

* There would no longer be a wired-in restriction that there be two and
only two subscripting execution functions per datatype, since there would
not be any need for those functions to be identified in pg_type.

Basically, with this approach, a subscriptable data type would need to
provide two cataloged support functions: parse, as we have now, and
compile. Actual execution functions would be outside that. (Possibly
we could merge the support functions into one function that takes an
operation code, similar to one of your earlier designs. Not sure that
that's better, but it'd be easier to extend in future if we decide we
need three support operations...)

The two disadvantages I can see of approaching things this way are:

* There'd be at least some connection of subscriptable types to
expression compilation, which is what Andres was objecting to in the
message I cited above. Possibly we could alleviate that a bit by
providing helper functions that mask exactly what goes into the
expression step structs, but I'm not sure that that gets us far.

* We'd not have OIDs of execution functions in the parse trees for
subscripting operations, which would mean that we would not have
a convenient way to identify subscripting operations that are
mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume
that subscripting is always immutable and parallel-safe, although
I'm slightly more concerned about whether people would want the
option to label it leaky vs leakproof. As against that, the approach
that's there right now adds planning overhead that wasn't there before
for exactly those function property lookups, and again I'm a bit worried
about the performance impact. (I did some crude performance tests
today that indicated that the existing patch has small but measurable
penalties, maybe on the order of 10 percent; and it'd be more by the
time we're done because I'm pretty sure you've missed some places that
ought to check these function properties if we're going to have them.
So I'm afraid that we'll get pushback from people who don't care about
extensible subscripts and do care about array performance.)

So roughly speaking, I'm imagining that we'd go back to a design similar
to one I recall you had at one point, where there's a single SQL-visible
subscripting support function per datatype, with a signature like

subscript_support(int opcode, internal other_info) returns internal

but the opcodes would now be "parse" and "compile". Actual execution
would use callback functions that don't have to be SQL-visible. This is
closer to the approach we've been using of late for things like AM APIs:
to the extent possible, there's just one SQL-registered handler function
and all else is a callback. Actually, we could make it *exactly* like
that, and have the registered handler give back a struct full of function
pointers rather than doing anything much itself. Maybe that's an even
better way to go.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-01-07 23:00:25 Re: Expression based aggregate transition / combine function invocation
Previous Message Everaldo Canuto 2018-01-07 21:21:52 Re: proposal: alternative psql commands quit and exit