Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From: Barry Lind <barry(at)xythos(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Date: 2001-05-08 01:10:00
Message-ID: 3AF74768.8060807@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

Tatsuo Ishii wrote:

>>> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
>>> something similar when in fact it doesn't know what the encoding is
>>> (i.e. when not compiled with multibyte).
>>
>
> Is that ok for Java? I thought Java needs to know the encoding
> beforehand so that it could convert to/from Unicode.

That is actually the original issue that started this thread. If you
want the full thread see the jdbc mail archive list. A user was
complaining that when running on a database without multibyte enabled,
that through psql he could insert and retrieve 8bit characters, but in
jdbc the 8bit characters were converted to ?'s.

I then explained why this was happening (db returns SQL_ASCII as the db
character set when not compiled with multibyte) so that character set is
used to convert to unicode.

Tom suggested that it would make more sense for jdbc to use LATIN1 when
the database reported SQL_ASCII so that most users will see 'correct'
behavior in a non multibyte database. Because currently you need to
enable multibyte support in order to use 8bit characters with jdbc.
Jdbc could easily be changed to treat SQL_ASCII as LATIN1, but I don't
feel that is an appropriate solution for the reasons outlined in this
thread (thus the suggestions for UNKNOWN, or the ability for the client
to determine if multibyte is enabled or not).

>
>> I have a philosophical difference with this: basically, I think that
>> since SQL_ASCII is the default value, you probably ought to assume that
>> it's not too trustworthy. The software can *never* be said to KNOW what
>> the data encoding is; at most it knows what it's been told, and in the
>> case of a default it probably hasn't been told anything. I'd argue that
>> SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
>> ought to be: ie, it's an unspecified 8-bit encoding (and from there
>> it's not much of a jump to deciding to treat it as LATIN1, if you're
>> forced to do conversion to Unicode or whatever). Certainly, seeing
>> SQL_ASCII from the server is not license to throw away data, which is
>> what JDBC is doing now.
>>
>>> PS. Note that if multibyte is enabled, the functionality that is being
>>> complained about here in the jdbc client is apparently ok for the server
>>> to do. If you insert a value into a text column on a SQL_ASCII database
>>> with multibyte enabled and that value contains 8bit characters, those
>>> 8bit characters will be quietly replaced with a dummy character since
>>> they are invalid for the SQL_ASCII 7bit character set.
>>
>> I have not tried it, but if the backend does that then I'd argue that
>> that's a bug too.
>
>
> I suspect the JDBC driver is responsible for the problem Burry has
> reported (I have tried to reproduce the problem using psql without
> success).
>
> >From interfaces/jdbc/org/postgresql/Connection.java:
>
>> if (dbEncoding.equals("SQL_ASCII")) {
>> dbEncoding = "ASCII";
>
>
> BTW, even if the backend behaves like that, I don't think it's a
> bug. Since SQL_ASCII is nothing more than an ascii encoding.

I believe Tom's point is that if multibyte is not enabled this isn't
true, since SQL_ASCII then means whatever character set the client wants
to use against the server as the server really doesn't care what single
byte data is being inserted/selected from the database.

>
>> To my mind, a MULTIBYTE backend operating in
>> SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
>> transparent pass-through of characters with the high bit set. But I'm
>> not a multibyte guru. Comments anyone?
>
>
> If you expect that behavior, I think the encoding name 'UNKNOWN' or
> something like that seems more appropreate. (SQL_)ASCII is just an
> ascii IMHO.

I agree.

>
> --
> Tatsuo Ishii
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>
--Barry

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-05-08 01:22:59 Paths for C functions (was Re: Re: backend dies on 7.1.1 loading large datamodel.)
Previous Message Tom Lane 2001-05-08 01:08:35 Re: backend dies on 7.1.1 loading large datamodel.

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tatsuo Ishii 2001-05-08 02:02:49 Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Previous Message Joseph Shraibman 2001-05-08 00:59:42 Re: [GENERAL] A different compile problem for 7.1.1