Re: Patch for collation using ICU

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>
Cc: <pgman(at)candle(dot)pha(dot)pa(dot)us>, <girgen(at)pingpong(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for collation using ICU
Date: 2005-05-08 04:07:29
Message-ID: 5066E5A966339E42AA04BA10BA706AE50A930B@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii wrote:
> Sent: Sunday, May 08, 2005 10:09 AM
> To: John Hansen
> Cc: pgman(at)candle(dot)pha(dot)pa(dot)us; girgen(at)pingpong(dot)net;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Patch for collation using ICU
>
> > Bruce Momjian wrote:
> > >
> > > There are two reasons for that optimization --- first,
> some locale
> > > support is broken and Unicode encoding with a C locale
> crashes (not
> > > an issue for ICU), and second, it is an optimization for
> languages
> > > like Japanese that want to use unicode, but don't need a locale
> > > because upper/lower means nothing in those character sets.
> >
> > No, upper/lower means nothing in those languages, so why would you
> > need to optimize upper/lower if they're not used??
> > And if they are, it's obviously because the text contains
> characters
> > from other languages (probably english) and as such they
> should behave
> > correctly.
>
> Yes, Japanese (and probably Chinese and Korean) languages
> include ASCII character. More precisely ASCII is part of Japanese
> encodings(LATIN1 is not, however). And we have no problem at
> all with glibc/C locale. See below("unitest" is an UNICODE database).
>
> unitest=# create table t1(t text);
> CREATE TABLE
> unitest=# \encoding EUC_JP
> unitest=# insert into t1 values('abcあいう');
> INSERT 1842628 1
> unitest=# select upper(t) from t1;
> upper
> -----------
> ABCあいう
> (1 row)
>
> So Japanese(including ASCII)/UNICODE behavior is perfectly
> correct at this moment.

Right, so you _never_ use accented ascii characters in Japanese?
(like è for example, whose uppercase is È)

> So I strongly object removing that optimization.

I'm guessing this would call for a vote then, since if implementing ICU, then
I'd have to object to leaving it in.

Changing the bahaviour of ICU doesn't seem right. Changing the behaviour of pg,
so that it works as it should when using unicode, seems the right solution to me.

> --
> Tatsuo Ishii
>
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Hansen 2005-05-08 04:15:16 Re: [HACKERS] Invalid unicode in COPY problem
Previous Message Madison Kelly 2005-05-08 04:02:37 Re: Invalid unicode in COPY problem