Re: Removing SORTFUNC_LT/REVLT

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing SORTFUNC_LT/REVLT
Date: 2005-12-31 10:38:30
Message-ID: 20051231103824.GA2423@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote:
> I think this is a mistake -- the same mistake that got us into trouble with
> Turkish.
>
> Hashing depends on the concept of equality which is integral to the type. Two
> things are either the same or they aren't, and that can't change based on
> context.

So someone who wants a case-insensetive search actually doesn't want
"Foo" to equal "foo"? If you're arguing that that should be a different
type, well, that's a possibility. But does that mean someone who wants
an accent insensetive match also needs a new type? And a phonebook
match, where "Mc" and "Mac" are the same?

It was my understanding that the problem with Turkish/Hungarian was the
we only allow one collation for strings over the whole database. The
point is that in the future you will be able to select this on a per
column/index/query basis, so we don't need to stick to such a
restriction if the user explicitly asks to ignore it.

On a more practical level, a Hash Join needs to produce the same
results as a Merge Join, so if (a = b) then (hash(a) = hash(b)). So if
the user types (a = b COLLATE ignorecase) then the hash function needs
to change to match.

> Specifically in the case of strings, two strings should only be considered
> "equal" if they consist of the exact same series of characters. (That is, they
> could be encoded differently but they have to encode the same actual
> characters.) That they happen to sort equally compared to all other strings
> doesn't mean that they're equal.

Sure, for straight strings (COLLATE POSIX), that's absolutly a
requirement. But people also have other requirements, like treating
strings case-insensetively. I don't think we should restrict ourselves
to not being able to support their wishes.

You do bring up the possibility of secondary sort functions. Functions
which are not involved in testing for equality, but provide addition
sorting so that even in a case-insensetive sort, the different
variations in case appear together. "All variations are equal, but some
are more equal than others" type setup.

Thanks for the feedback,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paesold 2005-12-31 11:59:44 Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Previous Message Greg Stark 2005-12-31 06:28:55 Re: EINTR error in SunOS