Re: OK, that's one LOCALE bug report too many...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: OK, that's one LOCALE bug report too many...
Date: 2000-11-24 22:31:30
Message-ID: 17693.975105090@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane writes:
>> I propose, therefore, that in an --enable-locale installation, initdb
>> should save its values for LC_COLLATE and LC_CTYPE in pg_control, and
>> backend startup should restore these settings from pg_control.

> Note that when these are unset there might still be a "catch-all" locale
> value coming from the LANG env. var. (or LC_ALL on some systems).

Actually, what I intend to do while writing pg_control is read the
current effective values via "setlocale(category, NULL)" --- then it
shouldn't matter where they came from, no?

This brings up a question I had just come across while doing further
research: backend/main/main.c does

#ifdef USE_LOCALE
setlocale(LC_CTYPE, ""); /* take locale information from an
* environment */
setlocale(LC_COLLATE, "");
setlocale(LC_MONETARY, "");
#endif

which seems a little odd --- why not setlocale(LC_ALL, "") ? Karel
Zak said in a thread around 8/15/00 that this is deliberate, but
I don't quite see why.

>> Also, since "LC_COLLATE=en_US" seems to misbehave rather spectacularly
>> on recent RedHat releases, I propose that initdb change "en_US" to "C"
>> if it finds that setting. (Are there any platforms where there are
>> non-bogus differences between the two?)

> There *should* be differences and it is definitely not okay to mix them
> up.

I have now received positive proof that en_US sort order on RedHat is
broken. For example, it asserts
'/root/' < '/root0'
but
'/root/t' > '/root0'
I defy you to find anyone in the US who will say that that is a
reasonable definition of string collation.

Of course, if you prefer the notion of disabling LIKE optimization
on a default RedHat installation, we can go ahead and accept en_US.
But I say it's broken and we shouldn't use it.

>> Finally, until we have a really bulletproof solution for LIKE indexing
>> optimization, I will disable that optimization if --enable-locale is
>> compiled *and* LC_COLLATE is not C. Better to get "LIKE is slow" bug
>> reports than "LIKE gives wrong answers" bug reports.

> (C or POSIX)

Do you think there are cases where setlocale(,NULL) will give back
"POSIX" rather than "C"? We can certainly test for either.

> I have a question about that optimization: If you have X LIKE 'foo%',
> wouldn't it be enough to use X >= 'foo' (which certainly works for any
> locale I've ever heard of)? Why do you need the X <= 'foo???' at all?

Because you need a two-sided index constraint, not a one-sided one.
Otherwise you're probably better off doing a sequential scan ---
scanning 50% of the table (on average) via an index will be slower
than sequential.

>> Comments? Anyone think that initdb should lock down more categories
>> than just these two?

> Not sure whether LC_CTYPE is necessary.

I'm not either, but I'm afraid to leave it float...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2000-11-24 23:18:30 Re: OK, that's one LOCALE bug report too many...
Previous Message Peter Eisentraut 2000-11-24 22:13:44 Re: OK, that's one LOCALE bug report too many...