Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem

From: Bill Shui <bill(dot)shui(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem
Date: 2005-10-22 07:12:00
Message-ID: 5cc04f210510220012l30082d73o@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I have the following scenario. I have two boxes (1 windows server 2003
and 1 linux debian sarge).

The debian box runs the PostgreSQL server and the windows box is using
Chinese character set.

If I want to building an application on windows (through ODBC), should
I connect to the server with client encoding set to EUC_CN or UNICODE?

On the server side, shoudl I initdb -E using EUC_CN or UNICODE?

Also, with the locale setting.
Shoudl I set --locale=zh_ZN.UTF-8?

Thanks.
Bill

On 20/10/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Stanislav Sukholet <ctac(at)osib(dot)so-cdu(dot)ru> writes:
> >> Can't reproduce this here. What locale settings are you using in the
> >> database? (Particularly lc_ctype and lc_messages)
>
> > mydb=> SHOW client_encoding ;
> > client_encoding
> > -----------------
> > KOI8
> > (1 запись)
>
> > mydb=> show LC_CTYPE;
> > lc_ctype
> > -------------
> > ru_RU.koi8r
> > (1 запись)
>
> > mydb=> show LC_MESSAGES;
> > lc_messages
> > -------------
> > ru_RU.koi8r
> > (1 запись)
>
> > mydb=> CREATE TABLE a (b INTEGER PRIMARY KEY);
> > ERROR: ignoring unconvertible UTF-8 character 0xd3cf
>
> OK, with that I can reproduce it in 7.4, but more recent releases
> produce a bunch of "WARNING: ignoring unconvertible UTF-8 character"
> notices and then complete the operation successfully.
>
> This is basically the same problem discussed in this thread:
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00037.php
> namely that gettext() converts the translated error message to the
> encoding implied by LC_CTYPE ... but the error reporting machinery
> expects the string to be in the encoding specified for the database.
>
> I have applied a minor tweak to the 7.4 branch to make it behave more
> like the later releases, ie you get a WARNING not an ERROR. However
> this is certainly not really a solution --- the only reason the behavior
> isn't worse is that the ru_RU message catalog doesn't try to translate
> "ignoring unconvertible UTF-8 character" and so you don't get into the
> recursive failure discussed in the above thread.
>
> The bottom line is that this is one of several reasons why it's a bad
> idea to use a database encoding that's incompatible with the underlying
> locale settings. I doubt that we'll really be able to fix that until
> we replace all our dependence on the C library's locale facilities
> ... which is something that will probably happen someday, but don't
> hold your breath waiting :-(
>
> In short, if you want to use UTF8 database encoding, specify a
> UTF8-based locale setting when you initdb. Don't try to change
> the database encoding via -E.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Persistence is the twin sister of excellence. One is a matter of
quality; the other, a matter of time.
Marabel Morgan, The Electric Woman

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Paul Lindner 2005-10-22 15:33:52 BUG #1987: UTF8 encoding differences hamper upgrades
Previous Message Jim C. Nasby 2005-10-21 17:10:57 Re: BUG #1977: Na data in GDV Spatial Commander