Using multi-locale support in glibc

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Using multi-locale support in glibc
Date: 2005-09-01 15:57:41
Message-ID: 20050901155741.GE28062@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Browsing the glibc stuff for locales I noticed that glibc does actually
allow you to specify the collation order to strcoll and friends. The
feature is however marked with:

Attention: all these functions are *not* standardized in any form.
This is a proof-of-concept implementation.

They do however work fine. I used my taggedtypes module to create a
type that binds the collation order to the text strings and the results
can be seen below.

1. Is something supported by glibc usable for us (re portability to
non-glibc platforms)?

2. Should we be trying to use an interface that's specifically marked
as unstable?

3. What's the plan to support multiple collate orders? There was a
message about it last year but I don't see much progress.

4. It makes some things more difficult. For example, my database is
UNICODE and until I specified a UTF8 locale it didn't come out right.
AFAIK the only easy way to determine if something is UTF8 compatable is
to use locale -k charmap. The C interface is hidden. It should be
possible to compile a list of locales and allow only ones matching the
database. Or automatically convert the strings, the conversion
functions exist.

5. Maybe we should evaluate the interface and give feedback to the
glibc developers to see if it can be made more stable.

If you want to have a look to see what's available, use:
rgrep -3 locale_t /usr/include/ |less

Have a nice day,

PS. The code to test this can be found at:
http://svana.org/kleptog/pgsql/taggedtypes.html

--- TEST OUTPUT ---

test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'C' );
strings
---------
Test2
Tést1
Tëst1
test1
tèst2
(5 rows)

test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'en_US' );
strings
---------
Tëst1
Tést1
tèst2
test1
Test2
(5 rows)

test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'nl_NL' );
ERROR: Locale 'nl_NL' not supported by library
test=# select strings from taggedtypes.locale_test order by locale_text( strings, 'en_AU.UTF-8' );
strings
---------
test1
Tést1
Tëst1
Test2
tèst2
(5 rows)
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-09-01 15:59:21 Re: On hardcoded type aliases and typmod for user types
Previous Message Alvaro Herrera 2005-09-01 15:56:10 Re: Remove xmin and cmin from frozen tuples