Re: Locale implementation questions

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: kleptog(at)svana(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, gsstark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Locale implementation questions
Date: 2005-09-04 13:25:36
Message-ID: 20050904.222536.39155679.ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 3. Compiled locale files are large. One UTF-8 locale datafile can
> exceed a megabyte. Do we want the option of disabling it for small
> systems?

To avoid the problem, you could dynmically load the compiled
tables. The charset conversion tables are handled similar way.

Also I think it's important to allow user defined collate data. To
implement the CREATE COLLATE syntax, we need to have that capability
anyway.

> 4. Do we want the option of running system locale in parallel with the
> internal ones?
>
> 5. I think we're going to have to deal with the very real possibility
> that our locale database will not be as good as some of the system
> provided ones. The question is how. This is quite unlike timezones
> which are quite standardized and rarely change. That database is quite
> well maintained.
>
> Would people object to a configure option that selected:
> --with-locales=internal (use pg database)
> --with-locales=system (use system database for win32, glibc or MacOS X)
> --with-locales=none (what we support now, which is neither)
>
> I don't think it will be much of an issue to support this, all the
> functions take the same parameters and have almost the same names.

To be honest, I don't understand why we have to rely on (often broken)
system locales. I don't think building our own locale data is too
hard, and once we make up it, the maintenace cost will be very small
since it should not be changed regularly. Moreover we could enjoy the
benefit that PostgreSQL handles collations in a corret manner on any
platform which PostgreSQL supports.

> 6. Locales for SQL_ASCII. Seems to me you have two options, either
> reject COLLATE altogether unless they specify a charset, or don't care
> and let the user shoot themselves in the foot if they wish...
>
> BTW, this MacOS locale supports seems to be new for 10.4.2 according to
> the CVS log info, can anyone confirm this?
>
> Anyway, I hope this post didn't bore too much. Locale support has been
> one of those things that has bugged me for a long time and it would be
> nice if there could be some real movement.

Right. We Japanese (and probably Chinese too) have been bugged by the
broken mutibyte locales for long time. Using C locale help us to a
certain extent, but for Unicode we need correct locale data, othewise
the sorted data will be completely chaos.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-09-04 15:01:13 Re: Locale implementation questions
Previous Message Martijn van Oosterhout 2005-09-04 12:31:05 Re: Locale implementation questions (was: Proof of concept COLLATE support with patch)