Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: barry(at)xythos(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Date: 2001-05-06 07:47:11
Message-ID: 20010506164711R.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

> > Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
> > something similar when in fact it doesn't know what the encoding is
> > (i.e. when not compiled with multibyte).

Is that ok for Java? I thought Java needs to know the encoding
beforehand so that it could convert to/from Unicode.

> I have a philosophical difference with this: basically, I think that
> since SQL_ASCII is the default value, you probably ought to assume that
> it's not too trustworthy. The software can *never* be said to KNOW what
> the data encoding is; at most it knows what it's been told, and in the
> case of a default it probably hasn't been told anything. I'd argue that
> SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
> ought to be: ie, it's an unspecified 8-bit encoding (and from there
> it's not much of a jump to deciding to treat it as LATIN1, if you're
> forced to do conversion to Unicode or whatever). Certainly, seeing
> SQL_ASCII from the server is not license to throw away data, which is
> what JDBC is doing now.
>
> > PS. Note that if multibyte is enabled, the functionality that is being
> > complained about here in the jdbc client is apparently ok for the server
> > to do. If you insert a value into a text column on a SQL_ASCII database
> > with multibyte enabled and that value contains 8bit characters, those
> > 8bit characters will be quietly replaced with a dummy character since
> > they are invalid for the SQL_ASCII 7bit character set.
>
> I have not tried it, but if the backend does that then I'd argue that
> that's a bug too.

I suspect the JDBC driver is responsible for the problem Burry has
reported (I have tried to reproduce the problem using psql without
success).

>From interfaces/jdbc/org/postgresql/Connection.java:

> if (dbEncoding.equals("SQL_ASCII")) {
> dbEncoding = "ASCII";

BTW, even if the backend behaves like that, I don't think it's a
bug. Since SQL_ASCII is nothing more than an ascii encoding.

> To my mind, a MULTIBYTE backend operating in
> SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
> transparent pass-through of characters with the high bit set. But I'm
> not a multibyte guru. Comments anyone?

If you expect that behavior, I think the encoding name 'UNKNOWN' or
something like that seems more appropreate. (SQL_)ASCII is just an
ascii IMHO.
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2001-05-06 10:26:48 Re: Lisp as procedural language
Previous Message Tom Lane 2001-05-06 05:12:45 Re: Lisp as procedural language

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tony Grant 2001-05-07 08:57:29 Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?
Previous Message Tom Lane 2001-05-05 15:21:58 MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)