Re: Patch for collation using ICU

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: John Hansen <john(at)geeknet(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch for collation using ICU
Date: 2005-03-25 13:22:51
Message-ID: 9660F286965D59F2BEE49288@palle.girgensohn.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


--On fredag, mars 25, 2005 23.39.33 +1100 John Hansen <john(at)geeknet(dot)com(dot)au>
wrote:

> Ok,.. tested on debian sarge with ICU 3.2
> UNICODE Database, C locale.
>
> upper() and lower() returns an empty string for any input, including
> 7bit ascii, regardless of client_encoding, so something is obviously
> broken.
>
> Have you tested this patch on a UNICODE DB with locale C/POSIX ?

No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 and UNICODE,
and also de_DE.UTF-8.

How will PostgreSQL react to this combo? A database cluster initdb:ed with
locale=C/POSIX, and then a database in UNICODE (really utf-8)
representation... hmm... I think I might have made a false assumption that
the locale string would contain the character encoding. I do something like
encoding = strchr(locale, '.') + 1... That code will be confused by a 'C'
locale, indeed. I'll check it out!

/Palle

>
> ... John
>
>> -----Original Message-----
>> From: John Hansen
>> Sent: Friday, March 25, 2005 10:27 PM
>> To: 'Palle Girgensohn'; 'pgsql-hackers(at)postgresql(dot)org'
>> Subject: RE: [HACKERS] Patch for collation using ICU
>>
>> > --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen
>> > <john(at)geeknet(dot)com(dot)au>
>> > wrote:
>> >
>> > > Useful if it's going to support earlier releases of ICU....
>> > >
>> > > Not all os's come with ICU3.2, debian for example,
>> > currently has 2.1
>> > > in testing, and 2.6 in unstable.
>> >
>> > Oh, OK. FreeBSD has only the 3.2 as port. I can check the older
>> > version, I doubt it would too much difference. Some
>> autoconf sorcery
>> > needed, perhaps.
>>
>> Naww, it's no biggie, we'll just need to include ICU with pg I think.
>> I tried that, there are several functions from ICU that you
>> use, that are not in ICU2.1
>>
>> Dono about 2.6.
>>
>> However, ICU3.2 compiles on debian with a small change to the
>> debian/rules file.
>> debian/tmp/etc is missing, so add mkdir debian/tmp/etc
>>
>> ... John
>>
>> >
>> > /Palle
>> >
>> > >
>> > > ... John
>> > >
>> > >> -----Original Message-----
>> > >> From: pgsql-hackers-owner(at)postgresql(dot)org
>> > >> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Palle
>> > >> Girgensohn
>> > >> Sent: Friday, March 25, 2005 10:40 AM
>> > >> To: pgsql-hackers(at)postgresql(dot)org
>> > >> Subject: [HACKERS] Patch for collation using ICU
>> > >>
>> > >> Hi!
>> > >>
>> > >> I've put together a patch for using IBM's ICU package for
>> > collation.
>> > >>
>> > >> If your OS does not have full support for collation ur
>> > >> uppercase/lowercase in multibyte locales, this might be
>> useful. If
>> > >> you are using a multibyte character encoding in your
>> database and
>> > >> want collation, i.e. order by, and also lower(), upper() and
>> > >> initcap() to work properly, this patch will do just that.
>> > >>
>> > >> This patch is needed for FreeBSD, since this OS has no
>> support for
>> > >> collation of for example unicode locales (that is,
>> wcscoll(3) does
>> > >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
>> > example).
>> > >> AFAIK the patch is *not* necessary for Linux, although IBM
>> > claims ICU
>> > >> collation to be about twice as fast as glibc for simple western
>> > >> locales.
>> > >>
>> > >> It adds a configure switch, `--with-icu', which will set
>> > up the code
>> > >> to use ICU instead of wchar_t and wcscoll.
>> > >>
>> > >> This has been tested only on FreeBSD-4.11 &
>> > FreeBSD-5-stable, where
>> > >> it seems to run well. I've not had the time to do any
>> comparative
>> > >> performance tests yet, but it seems it is at least not
>> slower than
>> > >> using LATIN1 with
>> > >> sv_SE.ISO8859-1 locale, perhaps even faster.
>> > >>
>> > >> I'd be delighted if some more experienced postgresql
>> hackers would
>> > >> review this stuff. The patch is pretty compact, so it's
>> > fast reading
>> > >> :) I'm planning to add this patch as an option (tagged
>> > >> "experimental") to FreeBSD's postgresql port. Any ideas
>> > about whether
>> > >> this is a good idea or not?
>> > >>
>> > >> Any thoughts or ideas are welcome!
>> > >>
>> > >> Cheers,
>> > >> Palle
>> > >>
>> > >> Patch at:
>> > >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
>> > > 005-03-14.diff>
>> > >>
>> > >> ICU at sourceforge: <http://icu.sf.net/>
>> > >>
>> > >>
>> > >> ---------------------------(end of
>> > >> broadcast)---------------------------
>> > >> TIP 7: don't forget to increase your free space map settings
>> > >>
>> > >>
>> >
>> >
>> >
>> >
>> >
>> >

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephan Szabo 2005-03-25 14:02:09 Re: pg_dump issue : Cannot drop a non-existent(?) trigger
Previous Message John Hansen 2005-03-25 12:39:33 Re: Patch for collation using ICU