Fixing row comparison semantics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Fixing row comparison semantics
Date: 2005-12-23 20:18:21
Message-ID: 22577.1135369101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've gotten interested again in the issue of row comparisons, eg
(a, b, c) >= (1, 2, 3)
We've discussed this before, the most comprehensive thread being
http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php
but nothing's gotten done. Unless someone's already working on this
I think I will take it up.

I said in that thread that I'd like to see the implementation handle
nonstandard operators, for example
(a, b) ~<~ ('foo', 'bar')
ought to match the sorting semantics of an index using text_pattern_ops.
This means that we need some way of figuring out the semantics to
associate with the row comparison. I think the individual pairwise
comparison operators could be looked up in the normal way based on the
datatypes of the left and right inputs, but the row comparison itself
has to understand whether it's doing "=", "<", etc in order to apply the
operators in the right way. We can't just rely on the operator name
if we are to support nonstandard operators.

I think the right way to deal with it is to look up the pairwise
operators in pg_amop to see if they are members of btree index
opclasses. This means that the row-wise construct would fail for any
operators that are not in opclasses, but that doesn't seem like a
serious problem in practice. (Note: we can handle <> by seeing if the
operator has a negator that is btree equality. All this logic exists
already in predtest.c.)

The tricky part is what if the operators are found in more than one
opclass --- for example, suppose someone has installed a reverse-sort
opclass? I suggest the following rules:

1. Determine which interpretations (btree strategy numbers) exist for
each pairwise operator. There must be at least one interpretation that
is common to all the operators, else fail (for instance, it doesn't help
if we can identify one operator as "<" and another as ">").

2. If there is more than one common interpretation, prefer the one that
uses the largest number of default opclasses. If there's a tie, we
could either reject the construct as ambiguous, or select one of the
possibilities arbitrarily ... any thoughts about that?

3. A given operator could have the selected interpretation in more than
one opclass. Prefer the default opclass if any; otherwise, again we
have the choice of rejecting or making an arbitrary choice.

Notice that I'm assuming we need to identify a specific opclass for each
pairwise operator. This is not strictly necessary for the = and <>
cases, because there you just need the given operator; but it is
necessary for the < <= > >= cases, where you must have a notion of
equality as well as the particular inequality operator. It may be worth
stopping once we've identified the common interpretation if it's = or <>.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Fuhr 2005-12-23 20:18:34 Re: where is the output
Previous Message Tom Lane 2005-12-23 19:32:10 Re: postmaster and postgres options assimilation