Re: OK, that's one LOCALE bug report too many...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL Development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: OK, that's one LOCALE bug report too many...
Date: 2000-11-24 23:45:18
Message-ID: 18097.975109518@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> I have now received positive proof that en_US sort order on RedHat is
>> broken. For example, it asserts
>> '/root/' < '/root0'
>> but
>> '/root/t' > '/root0'
>> I defy you to find anyone in the US who will say that that is a
>> reasonable definition of string collation.

> That's certainly very odd, but Unixware does this too, so it's probably
> some sort of standard. And a few other European/Latin locales I tried
> also do this.

I don't have very many platforms to try, but HPUX does not think that
en_US sorts that way. It may well be standard in some European locales,
but there's a reason why C locale acts the way it does: that behavior is
the accepted one on this side of the pond. Sufficiently well accepted
that it was quite a few years before American programmers noticed there
was any reason to behave differently ;-)

> This also explains (to me at least) the example you have above: When you
> look up words in a dictionary you ignore "funny characters". My American
> Heritage Dictionary explains:
> : Entries are listed in alphabetical order without taking into account
> : spaces or hyphens.

That's workable for an English dictionary, where symbols other than
letters are (a) rare and (b) usually irrelevant to the meaning. Do
you think anyone would tolerate treating "/" as a noise character in a
listing of Unix filenames, to take one counterexample? Unfortunately,
en_US does so.

This'd be less of a problem if we had support for per-column charset
and locale specifications. There'd be no objection to sorting a column
that contains only (or mostly) words like that. But I've got strong
doubts that the average user of a default RedHat installation expects
*all* data to get sorted that way, or that he wants us to honor a
default that he didn't ask for to the extent of disabling LIKE
optimization to make it work.

I suppose we could do it that way and add a FAQ entry:

Q. Why are my LIKE queries so slow?

A. Change your locale to C, then dump, initdb, reload.

But somehow I don't think that'll go over well...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lamar Owen 2000-11-25 00:07:06 Re: OK, that's one LOCALE bug report too many...
Previous Message Peter Eisentraut 2000-11-24 23:18:30 Re: OK, that's one LOCALE bug report too many...