Re: operator does not exist: character varying[] <> character[]

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: operator does not exist: character varying[] <> character[]
Date: 2014-12-13 23:41:30
Message-ID: 548CCEAA.7000108@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/14, 7:16 PM, Tom Lane wrote:
> Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
>> I'd say that array_eq (and probably _cmp) just needs to be taught to fall back to what oper() does, but this part of the commit message gives me pause:
>
>> "Change the operator search algorithms to look for appropriate btree or hash index opclasses, instead of assuming operators named '<' or '=' have the right semantics."
>
> As it should. array_cmp is the basis for a btree opclass, therefore
> it must *not* use operators that are not themselves btree operators.
>
> Quite aside from that, we need to move even further away from having
> internal system operations depend on operator-name-based lookups;
> see for instance the recent complaints over stuff like IS DISTINCT FROM
> failing for types whose operators aren't in the search path.

Agreed, but in a way that's not what we're doing here; we're trying to utilize an operator the user asked us to use. Of course, array_eq assumes that we want equality; I think that's a problem itself. rowtypes suffer from this too, but presumably it's not as big a deal because we require typid's to match exactly.

> It's arguable that the typcache code should be taught to look for
> binary-compatible opclasses if it can't find one directly for the
> specified type. I'm not sure offhand what rules we'd need to make
> to ensure such a search would yield deterministic results, though.

The risk I see is what happens when someone adds a new operator or cast and suddenly we have multiple paths. That should be fine for regular comparisons, but probably not in an index.

> Another possibility is that we might be able to extend the "text_ops"
> btree operator family to include an opclass entry for varchar, rather than
> relying on binary compatibility to find the text opclass. But that would
> also require some careful thought to understand what the relaxed
> invariants should be for the opfamily structure as a whole. We don't want
> to add more actual operators, for fear of creating ambiguous-operator
> lookup failures.

Yeah, to me that sounds like heading back down the road of assuming = means = and the other fun we had before classes and families... but maybe I'm just being paranoid.

I have an alternative idea, though I'm not sure it's worth the work. Instead of having special array-only operators we could instead apply regular operators to arrays. I believe we can do this and reuse existing operators, if we store an expression of how to combine the result from the previous iteration to the current one (ie: for < this would be (prev AND current), if there is a result value that should stop iteration (for comparison operators that would be false), and what to do with different size arrays. In the last case, you're either going to use that to provide a final result, substitute a specific value for any missing elements, or throw an error.

Pros:
With some additional information in the catalog, we could provide a lot more array operations, using existing operator functions.
We can use the same operator search logic we use for elements. If you can perform an operation on 2 elements and that operator has array support, then we're good to go. If we'd perform casting on the elements, we'd do the same casting with the array values.
We no longer need to assume things like = means equal. If text = int actually meant length(text) = int then as long as that operator had array support you could do text[] = int[].
This keeps all operator logic together, regardless of whether the input is an array or a base element. Not

Cons:
Could this cause problems in the planner? Selectivity estimation comes to mind.
transformAExprOp/make_op would need to do something different for arrays, because the function would need more information. If we can just extend OpExpr then maybe this isn't a big deal. (A simple grep shows 401 instances of OpExpr in src/backend).
I don't think we could use this same technique with rowtypes.
We'd still allow for array-specific operators; perhaps that might cause issues or at least confusion.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2014-12-14 00:17:32 Re: pgbench -f and vacuum
Previous Message Fabrízio de Royes Mello 2014-12-13 23:16:29 Re: CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...