Re: Fixing row comparison semantics

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Fixing row comparison semantics
Date: 2005-12-24 09:55:47
Message-ID: 20051224095547.GA617@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 23, 2005 at 03:18:21PM -0500, Tom Lane wrote:
> I've gotten interested again in the issue of row comparisons, eg
> (a, b, c) >= (1, 2, 3)
> We've discussed this before, the most comprehensive thread being
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php
> but nothing's gotten done. Unless someone's already working on this
> I think I will take it up.

<snip>

Since this is related to the COLLATE stuff I'm working on I'd like to
make a few comments.

> 1. Determine which interpretations (btree strategy numbers) exist for
> each pairwise operator. There must be at least one interpretation that
> is common to all the operators, else fail (for instance, it doesn't help
> if we can identify one operator as "<" and another as ">").

One thing my COLLATE patch does is distinguish between collations and
operator classes. So the reverse operator class issue disappears
because it's just a collation and doesn't need a operator class
(although it won't break anything, see below).

> 2. If there is more than one common interpretation, prefer the one that
> uses the largest number of default opclasses. If there's a tie, we
> could either reject the construct as ambiguous, or select one of the
> possibilities arbitrarily ... any thoughts about that?

In standard SQL, each node in a query has a collation. Columns use the
collation they were given when the table was created, constants use the
default for the type. It's a little more complicated than that, see the
standard for details.

Anyway, a collation identifies a btree operator class so this problem
solves itself. For each pair of values you are comparing, determine the
collation and look up the operator class to ensure you're using the
same strategy type. There are minor details relating to reverse
collations but they're minor.

The only problem reverse operator classes bring here is that the system
won't realise it and thus won't know that the index is usable. Unless
the user specifies the collation as part of the query.

> 3. A given operator could have the selected interpretation in more than
> one opclass. Prefer the default opclass if any; otherwise, again we
> have the choice of rejecting or making an arbitrary choice.

If there's a problem, bail. The standard allows you to specify the
collation on a per node basis so any ambiguities can be resolved by the
user.

So something like:

(a COLLATE hungarian, b COLLATE posix, c COLLATE ignorecase) >= ('x','y','z')

Would know exactly what to do (and if you could use an index)...

Now, since COLLATE support is still in progress, I'm not sure how much
any of this helps you. I'm up to modifying the scankeys but it's hard
when you jave to keep rgrepping the tree to work out what is called
from where...

For other people reading this thread, the reason why it can't be
decomposed into (a>=1 AND b>=2 AND c>=3) is because the standard treats
the row as a unit, checking left to right, so:

(4,0,0) < (5,0,0)
(1,2,3) > (0,7,8)

So it needs a new node type and needs to know which index to use.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-12-24 10:05:58 Re: Fixing row comparison semantics
Previous Message Christopher Kings-Lynne 2005-12-24 09:25:34 Re: Fixing row comparison semantics