Skip site navigation (1) Skip section navigation (2)

Re: Patch: add conversion from pg_wchar to multibyte

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Date: 2012-07-03 21:41:11
Message-ID: CAPpHfdssF4epQsghxDyyw_=8=tscHaXCU-wx14EoQwTsipvrEw@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, Jul 3, 2012 at 10:17 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> > OK.  So, in that case, I suggest that if the leading byte is non-zero,
> > we emit 0x9d followed by the three available bytes, instead of first
> > testing whether the first byte is >= 0xf0.  That test seems to serve
> > no purpose but to confuse the issue.
>
> Probably the code shoud look like this(see below comment):
>
>                 else if (lb >= 0xf0 && lb <= 0xfe)
>                 {
>                     if (lb <= 0xf4)
>                           *to++ = 0x9c;
>             else
>                           *to++ = 0x9d;
>                         *to++ = lb;
>                         *to++ = (*from >> 8) & 0xff;
>                         *to++ = *from & 0xff;
>                         cnt += 4;


It's likely we also need to assign some names to all these numbers
(0xf0, 0xf4, 0xfe, 0x9c, 0x9d). But it's hard for me to invent such names.


> > I further suggest that we improve the comments on the mule functions
> > for both wchar->mb and mb->wchar to make all this more clear.
>
> I have added comments about mule internal encoding by refreshing my
> memory and from old document found on
> web(
> http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string
> ).
>
> Please take a look at.  BTW, it seems conversion between multibyte and
> wchar can be roundtrip in the leading character is LCPRV2 case:
>
> If the second byte of wchar (out of 4 bytes of wchar. The first byte
> is always 0x00) is in range of 0xf0 to 0xf4, then the first byte of
> multibyte must be 0x9c.  If the second byte of wchar is in range of
> 0xf5 to 0xfe, then the first byte of multibyte must be 0x9d.


Should I intergrate these code changes into my patch? Or we would like to
commit them first?

------
With best regards,
Alexander Korotkov.

In response to

Responses

pgsql-hackers by date

Next:From: Alexander KorotkovDate: 2012-07-03 21:46:26
Subject: Re: Incorrect behaviour when using a GiST index on points
Previous:From: Alvaro HerreraDate: 2012-07-03 21:38:55
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group