Re: Patch for collation using ICU

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: John Hansen <john(at)geeknet(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch for collation using ICU
Date: 2005-05-07 14:06:43
Message-ID: 200505071406.j47E6h600785@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Palle Girgensohn wrote:
>
> --On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen <john(at)geeknet(dot)com(dot)au>
> wrote:
>
> > Btw, I had been planning to propose replacing every single one of the
> > built in charset conversion functions with calls to ICU (thus making pg
> > _depend_ on ICU), as this would seem like a cleaner solution than for us
> > to maintain our own conversion tables.
> >
> > ICU also has a fair few conversions that we do not have at present.

That is a much larger issue, similar to our shipping our own timezone
database. What does it buy us?

o Do we ship it in our tarball?
o Is the license compatible?
o Does it remove utils/mb conversions?
o Does it allow us to index LIKE (next high char)?
o Does it allow us to support multiple encodings in
a single database easier?
o performance?

> I just had a similar though. And why use ICU only for multibyte charsets?
> If I use LATIN1, I still expect upper('?') => SS, and I don't get it...
> Same for the Turkish example.

We assume the native toupper() can handle single-byte character
encodings. We use towupper() only for wide character sets.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-05-07 14:07:14 Re: Patch for collation using ICU
Previous Message Bruce Momjian 2005-05-07 13:52:59 Re: Patch for collation using ICU