Tightening selection of default sort/group operators

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Tightening selection of default sort/group operators
Date: 2002-11-29 17:52:02
Message-ID: 27479.1038592322@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed that the system is really pretty shaky about how it chooses
the datatype-specific operators to implement sorting and grouping.
In the GROUP BY case, for example, the parser looks up an operator
named '<' for the column datatype, and then sometime later the executor
looks up an operator named '=' for that datatype, and we blithely assume
that these operators play together and have the expected semantics.
This seems dangerous in a world of user-definable operators. (I think
it's already broken by the standard datatype "tinterval", in fact,
because tinterval's "=" operator doesn't have the semantics of full
equality.)

What I'm thinking of doing instead is always looking up the "=" operator
by name, and accepting this as actually being equality if it is marked
mergejoinable or hashjoinable or has eqsel() as its restriction
selectivity estimator (oprrest). If we are looking for a "<" operator
to implement sorting/grouping, then we require "=" to be mergejoinable,
and we use its lsortop operator (regardless of name).

The only standard datatypes for which this would change the behavior
are tinterval, path, lseg, and line --- none of which could be sorted/grouped
correctly with the available operators, anyhow. User-defined datatypes
would stop working as sort/group columns unless the author were careful
to mark the equality operator as mergejoinable, but that's a simple
addition to the operator definition.

Comments, objections?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-11-29 17:59:38 Aren't lseg_eq and lseg_ne broken?
Previous Message Joe Conway 2002-11-29 17:14:07 Re: One SQL to access two databases.