Re: Patch for collation using ICU

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "John Hansen" <john(at)geeknet(dot)com(dot)au>, "Palle Girgensohn" <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for collation using ICU
Date: 2005-03-25 12:39:33
Message-ID: 5066E5A966339E42AA04BA10BA706AE5627B@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok,.. tested on debian sarge with ICU 3.2
UNICODE Database, C locale.

upper() and lower() returns an empty string for any input, including
7bit ascii, regardless of client_encoding, so something is obviously
broken.

Have you tested this patch on a UNICODE DB with locale C/POSIX ?

... John

> -----Original Message-----
> From: John Hansen
> Sent: Friday, March 25, 2005 10:27 PM
> To: 'Palle Girgensohn'; 'pgsql-hackers(at)postgresql(dot)org'
> Subject: RE: [HACKERS] Patch for collation using ICU
>
> > --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen
> > <john(at)geeknet(dot)com(dot)au>
> > wrote:
> >
> > > Useful if it's going to support earlier releases of ICU....
> > >
> > > Not all os's come with ICU3.2, debian for example,
> > currently has 2.1
> > > in testing, and 2.6 in unstable.
> >
> > Oh, OK. FreeBSD has only the 3.2 as port. I can check the older
> > version, I doubt it would too much difference. Some
> autoconf sorcery
> > needed, perhaps.
>
> Naww, it's no biggie, we'll just need to include ICU with pg I think.
> I tried that, there are several functions from ICU that you
> use, that are not in ICU2.1
>
> Dono about 2.6.
>
> However, ICU3.2 compiles on debian with a small change to the
> debian/rules file.
> debian/tmp/etc is missing, so add mkdir debian/tmp/etc
>
> ... John
>
> >
> > /Palle
> >
> > >
> > > ... John
> > >
> > >> -----Original Message-----
> > >> From: pgsql-hackers-owner(at)postgresql(dot)org
> > >> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Palle
> > >> Girgensohn
> > >> Sent: Friday, March 25, 2005 10:40 AM
> > >> To: pgsql-hackers(at)postgresql(dot)org
> > >> Subject: [HACKERS] Patch for collation using ICU
> > >>
> > >> Hi!
> > >>
> > >> I've put together a patch for using IBM's ICU package for
> > collation.
> > >>
> > >> If your OS does not have full support for collation ur
> > >> uppercase/lowercase in multibyte locales, this might be
> useful. If
> > >> you are using a multibyte character encoding in your
> database and
> > >> want collation, i.e. order by, and also lower(), upper() and
> > >> initcap() to work properly, this patch will do just that.
> > >>
> > >> This patch is needed for FreeBSD, since this OS has no
> support for
> > >> collation of for example unicode locales (that is,
> wcscoll(3) does
> > >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
> > example).
> > >> AFAIK the patch is *not* necessary for Linux, although IBM
> > claims ICU
> > >> collation to be about twice as fast as glibc for simple western
> > >> locales.
> > >>
> > >> It adds a configure switch, `--with-icu', which will set
> > up the code
> > >> to use ICU instead of wchar_t and wcscoll.
> > >>
> > >> This has been tested only on FreeBSD-4.11 &
> > FreeBSD-5-stable, where
> > >> it seems to run well. I've not had the time to do any
> comparative
> > >> performance tests yet, but it seems it is at least not
> slower than
> > >> using LATIN1 with
> > >> sv_SE.ISO8859-1 locale, perhaps even faster.
> > >>
> > >> I'd be delighted if some more experienced postgresql
> hackers would
> > >> review this stuff. The patch is pretty compact, so it's
> > fast reading
> > >> :) I'm planning to add this patch as an option (tagged
> > >> "experimental") to FreeBSD's postgresql port. Any ideas
> > about whether
> > >> this is a good idea or not?
> > >>
> > >> Any thoughts or ideas are welcome!
> > >>
> > >> Cheers,
> > >> Palle
> > >>
> > >> Patch at:
> > >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
> > > 005-03-14.diff>
> > >>
> > >> ICU at sourceforge: <http://icu.sf.net/>
> > >>
> > >>
> > >> ---------------------------(end of
> > >> broadcast)---------------------------
> > >> TIP 7: don't forget to increase your free space map settings
> > >>
> > >>
> >
> >
> >
> >
> >
> >

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2005-03-25 13:22:51 Re: Patch for collation using ICU
Previous Message John Hansen 2005-03-25 11:26:46 Re: Patch for collation using ICU