Re: Case Conversion Fix for MB Chars

From: Volkan YAZICI <volkan(dot)yazici(at)gmail(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Case Conversion Fix for MB Chars
Date: 2005-12-02 20:07:59
Message-ID: 7104a7370512021207w4d3568b2i37e156d9cb03daef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches pgsql-tr-genel

Last minute edit:
src/test/mb seems a little bit old. I've tested SQL files in
src/test/mb/sql with the expected results in src/test/mb/expected
manually and it worked. (Output files need a little bit editing, like
removing lines similar to "CREATE TABLE".) But it'll be better if any
EUC users will try 'em manually too.

On 12/2/05, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> Volkan YAZICI wrote:
> > After Tom's advice (he was doubtful about the patch), while I was
> > thinking about how to improve the spectrum of tests, decided to use
> > src/test/mb. In the tests, patch just succeded for unicode and failed
> > on big5, euc_cn, euc_jp, euc_kr, euc_tw, mule_internal, sjis
> > encodings. Fails' reason can be my wrong configuration too. (I've made
> > tests on a both unicode and latin-5 encoded databases.)
>
> Do those encodings even have uppercase letters?

According to what IRC folks, yes.

> People have talked about ICU but I don't know if anyone is working on it
> now.

Furthermore, there're some unofficial ICU patches for PostgreSQL
around. Like the one @
http://people.freebsd.org/~girgen/postgresql-icu/README.html

> I think the big problem is that while your patch works for some cases,
> it fails for others

As I mentioned in the above, it seems like it's working for other ones too.

> and there is no good way to know/test which will
> work and which will not. Is that accurate?

You don't want to commit this patch because it breaks[*] EUC like
encodings. But OTOH, it fixes LatinN and UNICODE encodings. I'm really
wondering, while we're trying to protect the EUC encodings still
working, why there's not any EUC users around to take care of EUC
tests? Doesn't EUC have any problems? Do ILIKE, upper/lower work for
them properly?

[*] If I didn't make a mistake, manual tests succeded for EUC like
encodings too.

You can think the reverse of the subject too. Think LatinN and UNICODE
as working and somebody submitted a patch which fixes EUC encodings by
breaking the previous ones. What will be the reaction of PostgreSQL
team in this situation?

Regards.

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2005-12-02 20:12:00 Re: Numeric 508 datatype
Previous Message Tom Lane 2005-12-02 20:06:31 Re: Numeric 508 datatype

Browse pgsql-tr-genel by date

  From Date Subject
Next Message Volkan YAZICI 2005-12-03 23:15:02 Re: Veri özeti, Like ile arama performans'ı
Previous Message Bruce Momjian 2005-12-02 17:26:53 Re: Case Conversion Fix for MB Chars