Operator class group proposal

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Operator class group proposal
Date: 2006-12-13 21:27:09
Message-ID: 11695.1166045229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After further thought about the mergejoinable-operators issue and some
other longstanding planner problems, I have a modest proposal to make:
we should invent the notion of "operator class groups", which identify
sets of compatible operator classes. (I'm not wedded to the name "class
group"; it seems a bit redundant. Anyone have a better idea?) We've
touched on related ideas in past threads but never really put together
a concrete proposal. Here is one.

A class group is associated with a specific index AM and can contain only
opclasses for that AM. We might for instance invent "numeric" and
"numeric_reverse" groups for btree, to contain the default opclasses and
reverse-sort opclasses for the standard arithmetic types.

We further require that any given opclass be a member of at most one class
group (this simplifies matters, and there isn't any application I can see
for one opclass being in more than one group), and that a class group
contain at most one opclass for a given datatype (ditto). Recall that
the member operators of an opclass all have the opclass' indexable
datatype as their lefthand inputs, but the righthand input datatype can
vary. These rules imply that any given operator can stand in at most one
relationship to a class group: there is only one potential class member it
could be in. It's still possible for an operator to appear in different
groups (e.g., within both normal and reverse-sort groups).

For btree opclasses, if two opclasses are in the same group, it means that
they have compatible sort ordering and equality rules. In particular, for
any "=" operators of member opclasses and values A, B, C of the
appropriate datatypes, we require
A = B and B = C implies A = C
Likewise for "<" operators:
A < B and B < C implies A < C
and likewise for <=, >, >= (I think these cases are implied if the first
two must hold).

I believe these rules are sufficient to guarantee that any "=" operator in
the opclass group can be mergejoined using "<" and ">" operators taken
from the appropriate member opclasses. Therefore we can entirely eliminate
the oprlsortop etc. fields of pg_operator and rely on looking into the
opclass catalogs to determine how to do mergejoins. This is an
improvement over the current situation (and also my proposal of yesterday)
because it will be possible to support both ascending and descending
mergejoins. In the current structure we can only associate one lsortop
("<") with any "=", but if there are reverse-sort opclasses in existence
it would be nice to be able to perform a mergejoin using ">" as well.

Furthermore, the class group structure provides a solid foundation for the
planner's transitive-equality deductions, which are currently pretty bogus
as discussed yesterday.

Lastly, the group structure fixes a nasty problem in predicate testing:
currently, we tend to fail to prove implications involving cross-datatype
operators because we can't find operators that are in the same opclass and
compare the two types we need to compare. With class groups, we can use
any two operators in the same class group to make a deduction. (This also
cleans up btree scankey analysis for cross-type situations; it has the
same problem of not knowing how to determine which key values are lesser
or greater.)

For hash opclasses, placing two opclasses in the same group means that
they have compatible hash functions. This allows us to support hashing of
cross-type comparisons. As an example, "int4 eq int8" can be hashed if
the int8 hash function guarantees that for any int8 value that could fit
in int4, it will produce the same hash code as the int4 hash function
does. (Which, in fact, our current integer hash functions are designed to
do.) Currently we cannot exploit this because we have no way to identify
which two hash functions to use. With opclass groups, we could put
cross-type equality operators into hash opclasses the same way as we do
for btree opclasses, and the rule would be to use the hash functions for
the group's opclasses matching the left and right datatypes.

I don't have any idea whether opclass groups would be useful for GiST or
GIN indexes, but maybe Oleg and Teodor can think of applications.

As far as syntax goes, I was envisioning

CREATE/DROP OPERATOR CLASS GROUP name USING index_method;

and adding an optional "GROUP name" clause to CREATE OPERATOR CLASS.
We could imagine adding a lot of additional stuff such as being able to
ALTER an existing opclass' group membership, but at least for the first
cut I see no need for anything more than this. (In particular, since we'd
probably only allow these commands to superusers, we needn't bother with a
concept of ownership for class groups.)

In the system catalogs, we'd need to add a "pg_opclassgroup" catalog
with OIDs and columns for name, namespace, and index AM's OID; and
pg_opclass would get a new column for the OID of the class's group,
or 0 if not in any group (which really effectively means it's a group
of one opclass).

BTW, I'm also pretty strongly tempted to denormalize the catalogs a bit
by adding group OID and index AM OID to individual pg_amop entries,
thereby reducing lookup time when trying to identify an operator's
properties. Currently we need an extra probe into pg_opclass to see
which index AM the opclass belongs to, and with this proposal we'd also
be looking at the group OID quite a lot.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-12-13 21:32:52 Re: Better management of mergejoinable operators
Previous Message Jim C. Nasby 2006-12-13 21:11:12 Re: Vacuum, analyze, and setting reltuples of pg_class