Allowing extensions to supply operator-/function-specific info

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Allowing extensions to supply operator-/function-specific info
Date: 2019-01-20 23:48:13
Message-ID: 15193.1548028093@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the thread at [1], we realized that PostGIS has been thrashing
around trying to fake its way to having "special index operators", ie
a way to automatically convert WHERE clauses into lossy index quals.
That's existed in a non-extensible way inside indxpath.c for twenty
years come July. Since the beginning I've thought we should provide
a way for extensions to do similar things, but it never got to the top
of the to-do queue. Now I think it's time.

One low-effort answer is to add a hook call in indxpath.c that lets
extensions manipulate the sets of index clauses extracted from a
relation's qual clauses, but I don't especially like that: it dumps
all the work onto extensions, resulting in lots of code duplication,
plus they have a badly-documented and probably moving target for what
they have to do.

Another bit of technical debt that's even older is the lack of a way
to attach selectivity estimation logic to boolean-returning functions.
So that motivates me to think that whatever we do here should be easily
extensible to allow different sorts of function- or operator-related
knowledge to be supplied by extensions. We already have oprrest,
oprjoin, and protransform hooks that allow certain kinds of knowledge
to be attached to operators and functions, but we need something a bit
more general.

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes. For example, we could say that the function takes a single
Node* argument, and that the type of Node tells it what to do, and if it
doesn't recognize the type of Node it should just return NULL indicating
"use default handling". We'd start out with two relevant Node types,
one for the selectivity-estimation case and one for the extract-a-lossy-
index-qual case, and we could add more over time.

What we can do to attach such a support function to a target function
is to repurpose the pg_proc.protransform column to represent the
support function. The existing protransform functions already have
nearly the sort of API I'm thinking about, but they only accept
FuncExpr* not any other node type. It'd be easy to change them
though, because there's only about a dozen and they are all in core;
we never invented any way for extensions to access that functionality.
(So actually, the initial API spec here would include three
possibilities, the third one being equivalent to the current
protransform behavior.)

As for attaching support functions to operators, we could
consider widening the pg_operator catalog to add a new column.
But I think it might be a cleaner answer to just say "use the support
function attached to the operator's implementation function,
if there is one". This would require that the support functions
be able to cope with either FuncExpr or OpExpr inputs, but that
does not seem like much of a burden as long as it's part of the
API spec from day one.

Since there isn't any SQL API for attaching support functions,
we'd have to add one, but adding another clause to CREATE FUNCTION
isn't all that hard. (Annoyingly, we haven't created any cheaply
extensible syntax for CREATE FUNCTION, so this'd likely require
adding another keyword. I'm not interested in doing more than
that right now, though.)

I'd be inclined to rename pg_proc.protransform to "prosupport"
to reflect its wider responsibility, and make the new CREATE FUNCTION
clause be "SUPPORT FUNCTION foo" or some such. I'm not wedded
to that terminology, if anyone has a better idea.

One thing that's not entirely clear to me is what permissions would be
required to use that clause. The support functions will have signature
"f(internal) returns internal", so creating them at all will require
superuser privilege, but it seems like we probably also need to restrict
the ability to attach one to a target function --- attaching one to
the wrong function could probably have bad consequences. The easy way
out is to say "you must be superuser"; maybe that's enough for now,
since all the plausible use-cases for this are in extensions containing
C functions anyway. (A support function would have to be coded in C,
although it seems possible that its target function could be something
else.)

Thoughts? If we have agreement on this basic design, making it happen
seems like a pretty straightforward task.

regards, tom lane

PS: there is, however, a stumbling block that I'll address in a separate
message, as it seems independent of this infrastructure.

[1] https://www.postgresql.org/message-id/flat/CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=A(at)mail(dot)gmail(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-20 23:50:33 Allowing extensions to find out the OIDs of their member objects
Previous Message Tomas Vondra 2019-01-20 23:43:59 Re: [PROPOSAL] Shared Ispell dictionaries