Skip site navigation (1) Skip section navigation (2)

Re: Patch for collation using ICU

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Palle Girgensohn" <girgen(at)pingpong(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for collation using ICU
Date: 2005-03-25 23:42:19
Message-ID: 5066E5A966339E42AA04BA10BA706AE5627C@rodrick.geeknet.com.au (view raw or flat)
Thread:
Lists: pgsql-hackers
> --On fredag, mars 25, 2005 23.39.33 +1100 John Hansen 
> <john(at)geeknet(dot)com(dot)au>
> wrote:
> 
> > Ok,.. tested on debian sarge with ICU 3.2 UNICODE Database, 
> C locale.
> >
> > upper() and lower() returns an empty string for any input, 
> including 
> > 7bit ascii, regardless of client_encoding, so something is 
> obviously 
> > broken.
> >
> > Have you tested this patch on a UNICODE DB with locale C/POSIX ?

FYI, I also found that initdb crashes with error 139 on any locale other
than C/POSIX.

> 
> No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 
> and UNICODE, and also de_DE.UTF-8.
> 
> How will PostgreSQL react to this combo? A database cluster 
> initdb:ed with locale=C/POSIX, and then a database in UNICODE 
> (really utf-8) representation... hmm... I think I might have 
> made a false assumption that the locale string would contain 
> the character encoding. I do something like encoding = 
> strchr(locale, '.') + 1... That code will be confused by a 'C' 
> locale, indeed. I'll check it out!
> 
> /Palle
> 
> 
> 
> >
> > ... John
> >
> >> -----Original Message-----
> >> From: John Hansen
> >> Sent: Friday, March 25, 2005 10:27 PM
> >> To: 'Palle Girgensohn'; 'pgsql-hackers(at)postgresql(dot)org'
> >> Subject: RE: [HACKERS] Patch for collation using ICU
> >>
> >> > --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen 
> >> > <john(at)geeknet(dot)com(dot)au>
> >> > wrote:
> >> >
> >> > > Useful if it's going to support earlier releases of ICU....
> >> > >
> >> > > Not all os's come with ICU3.2, debian for example,
> >> > currently has 2.1
> >> > > in testing, and 2.6 in unstable.
> >> >
> >> > Oh, OK. FreeBSD has only the 3.2 as port. I can check the older 
> >> > version, I doubt it would too much difference. Some
> >> autoconf sorcery
> >> > needed, perhaps.
> >>
> >> Naww, it's no biggie, we'll just need to include ICU with 
> pg I think.
> >> I tried that, there are several functions from ICU that 
> you use, that 
> >> are not in ICU2.1
> >>
> >> Dono about 2.6.
> >>
> >> However, ICU3.2 compiles on debian with a small change to the 
> >> debian/rules file.
> >> debian/tmp/etc is missing, so add mkdir debian/tmp/etc
> >>
> >> ... John
> >>
> >> >
> >> > /Palle
> >> >
> >> > >
> >> > > ... John
> >> > >
> >> > >> -----Original Message-----
> >> > >> From: pgsql-hackers-owner(at)postgresql(dot)org
> >> > >> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf 
> Of Palle 
> >> > >> Girgensohn
> >> > >> Sent: Friday, March 25, 2005 10:40 AM
> >> > >> To: pgsql-hackers(at)postgresql(dot)org
> >> > >> Subject: [HACKERS] Patch for collation using ICU
> >> > >>
> >> > >> Hi!
> >> > >>
> >> > >> I've put together a patch for using IBM's ICU package for
> >> > collation.
> >> > >>
> >> > >> If your OS does not have full support for collation ur 
> >> > >> uppercase/lowercase in multibyte locales, this might be
> >> useful. If
> >> > >> you are using a multibyte character encoding in your
> >> database and
> >> > >> want collation, i.e. order by, and also lower(), upper() and
> >> > >> initcap() to work properly, this patch will do just that.
> >> > >>
> >> > >> This patch is needed for FreeBSD, since this OS has no
> >> support for
> >> > >> collation of for example unicode locales (that is,
> >> wcscoll(3) does
> >> > >> not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
> >> > example).
> >> > >> AFAIK the patch is *not* necessary for Linux, although IBM
> >> > claims ICU
> >> > >> collation to be about twice as fast as glibc for 
> simple western 
> >> > >> locales.
> >> > >>
> >> > >> It adds a configure switch, `--with-icu', which will set
> >> > up the code
> >> > >> to use ICU instead of wchar_t and wcscoll.
> >> > >>
> >> > >> This has been tested only on FreeBSD-4.11 &
> >> > FreeBSD-5-stable, where
> >> > >> it seems to run well. I've not had the time to do any
> >> comparative
> >> > >> performance tests yet, but it seems it is at least not
> >> slower than
> >> > >> using LATIN1 with
> >> > >> sv_SE.ISO8859-1 locale, perhaps even faster.
> >> > >>
> >> > >> I'd be delighted if some more experienced postgresql
> >> hackers would
> >> > >> review this stuff. The patch is pretty compact, so it's
> >> > fast reading
> >> > >> :)  I'm planning to add this patch as an option (tagged
> >> > >> "experimental") to FreeBSD's postgresql port. Any ideas
> >> > about whether
> >> > >> this is a good idea or not?
> >> > >>
> >> > >> Any thoughts or ideas are welcome!
> >> > >>
> >> > >> Cheers,
> >> > >> Palle
> >> > >>
> >> > >> Patch at:
> >> > >> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
> >> > > 005-03-14.diff>
> >> > >>
> >> > >> ICU at sourceforge: <http://icu.sf.net/>
> >> > >>
> >> > >>
> >> > >> ---------------------------(end of
> >> > >> broadcast)---------------------------
> >> > >> TIP 7: don't forget to increase your free space map settings
> >> > >>
> >> > >>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> 
> 
> 
> 
> 
> 

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2005-03-25 23:46:58
Subject: Re: HeapTupleSatisfiesUpdate missing a bet?
Previous:From: Jim ButtafuocoDate: 2005-03-25 23:36:02
Subject: Missing segment 3 of index

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group