Re: Unicode problems on IRC

From: "John Hansen" <john(at)geeknet(dot)com(dot)au>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unicode problems on IRC
Date: 2005-04-10 02:53:33
Message-ID: 5066E5A966339E42AA04BA10BA706AE5628D@rodrick.geeknet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Bruce Momjian
> Sent: Sunday, April 10, 2005 8:18 AM
> To: Christopher Kings-Lynne
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Unicode problems on IRC
>
> Christopher Kings-Lynne wrote:
> > Hey guys,
> >
> > The 'Unicode characters above 0x10000' issue keeps rearing its ugly
> > head in the IRC channel. I propose that it be fixed, even
> backported...
> >
> > This is John Hansen's most recent patch to fix it:
> >
> > http://archives.postgresql.org/pgsql-patches/2004-11/msg00259.php
> >
> > And from what I can tell it was committed, then reverted because it
> > wasn't a "bug". It was going to go in for 8.1.
> >
> > We on the channel are starting to think that it is in fact a bug.
> > There are are people with legitimately utf-8 encoded XML documents
> > that they cannot store in PostgreSQL. Apparently in the
> distant past,
> > Unicode was limited to 0x10000, but then was extended.
> >
> > Perhaps we can reopen this case...
>
> Uh, I thought we fixed this another way, buy not using
> Unicode-aware functions for upper/lower/initcap when the
> locale is "C" or "POSIX".
> That is backpatched to 8.0.X. Does that not fix the problem reported?

No, as andrew said, what this patch does, is allow values > 0xffff and
at the same time validates the input to make sure it's valid utf8.

... John
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index
> scan if your
> joining column's datatypes do not match
>
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-04-10 03:28:44 Re: Unicode problems on IRC
Previous Message Andrew - Supernews 2005-04-10 00:03:36 Re: Unicode problems on IRC