Re: OK, that's one LOCALE bug report too many...

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: OK, that's one LOCALE bug report too many...
Date: 2000-11-24 23:18:30
Message-ID: Pine.LNX.4.21.0011242345230.791-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane writes:

> >> Also, since "LC_COLLATE=en_US" seems to misbehave rather spectacularly
> >> on recent RedHat releases, I propose that initdb change "en_US" to "C"
> >> if it finds that setting. (Are there any platforms where there are
> >> non-bogus differences between the two?)
>
> > There *should* be differences and it is definitely not okay to mix them
> > up.
>
> I have now received positive proof that en_US sort order on RedHat is
> broken. For example, it asserts
> '/root/' < '/root0'
> but
> '/root/t' > '/root0'
> I defy you to find anyone in the US who will say that that is a
> reasonable definition of string collation.

That's certainly very odd, but Unixware does this too, so it's probably
some sort of standard. And a few other European/Latin locales I tried
also do this.

But here's another example of why C and en_US are different.

peter ~$ cat foo
Delta
écrire
Beta
alpha
gamma
peter ~$ LC_COLLATE=C sort foo
Beta
Delta
alpha
gamma
écrire
peter ~$ LC_COLLATE=en_US sort foo
alpha
Beta
Delta
écrire
gamma

The C locale sorts strictly by character code. But in the en_US locale
the accented letter is put into a "natural" position, and the upper and
lower case letters are grouped together. Intuitively, the en_US order is
in which you'd look up things in a dictionary.

This also explains (to me at least) the example you have above: When you
look up words in a dictionary you ignore "funny characters". My American
Heritage Dictionary explains:

: Entries are listed in alphabetical order without taking into account
: spaces or hyphens.

So at least this concept isn't that far out.

> Do you think there are cases where setlocale(,NULL) will give back
> "POSIX" rather than "C"? We can certainly test for either.

I know there are (old) systems that reject LANG=C as invalid locale, but I
don't know what setlocale returns there.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://yi.org/peter-e/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-11-24 23:45:18 Re: OK, that's one LOCALE bug report too many...
Previous Message Tom Lane 2000-11-24 22:31:30 Re: OK, that's one LOCALE bug report too many...