Re: Rep:Re: [BUGS] Encoding Problem?

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: cnliou(at)eurosport(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rep:Re: [BUGS] Encoding Problem?
Date: 2002-03-05 15:01:27
Message-ID: 20020306000127E.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I guess you are inserting correct EUC Traditional
> Chinese (EUC-TW)
> characters but hard to tell what is happening unless
> you are showing
> us the character sequences in hexa decimal format.
> --
> Tatsuo Ishii
> ===============================
> Many thanks! Tatsuo,
>
> Please see below. Best Regards,
>
> CN
> ---------------
> linux:~$ cat /tmp/tt
> 1111
> ¦¨¥\
> ³\
> 2222
> linux:~$ od -t x /tmp/tt
> 0000000 31313131 a5a8a60a 5cb30a5c 3232320a
> 0000020 00000a32
> 0000022

Are you sure that they are EUC-TW? Considering the byte swapping, they
are actually like this:

0x31,0x31,0x31,0x31,0x0a,
0xa6,0xa8,0xa5,0x5c,0x0a,
0xb3,0x5c,0x0a,
0x32,0x32,0x32,0x32,0x0a

Here we see a55c and b35c, which should never happen in EUC-TW, since
the each second byte is lower than 0x80.
I guess they are BIG5. If my guess is correct, you could set the
client encoding to BIG5 ("\encoding BIG5" in psql) and get correct
result.
--
Tatsuo Ishii

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-03-05 16:17:16 Re: [PATCHES] WITH DELIMITERS in COPY
Previous Message Fernando Nasser 2002-03-05 14:36:01 Re: Reverting SET SESSION AUTHORIZATION command