MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Barry Lind <barry(at)xythos(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgsql-jdbc(at)postgreSQL(dot)org
Subject: MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Date: 2001-05-05 15:21:58
Message-ID: 5642.989076118@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

[ thread renamed and cross-posted to pghackers, since this isn't only
about JDBC anymore ]

Barry Lind <barry(at)xythos(dot)com> writes:
> The basic issue I have it that the server is providing an API to the
> client to get the character encoding for the database and that API can
> report incorrect information to the client.

I don't have any objection to changing the system so that even a
non-MULTIBYTE server can store and return encoding settings.
(Presumably it should only accept encoding settings that correspond
to single-byte encodings.) That can't happen before 7.2, however,
as the necessary changes are a bit larger than I'd care to shoehorn
into a 7.1.* release.

> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
> something similar when in fact it doesn't know what the encoding is
> (i.e. when not compiled with multibyte).

I have a philosophical difference with this: basically, I think that
since SQL_ASCII is the default value, you probably ought to assume that
it's not too trustworthy. The software can *never* be said to KNOW what
the data encoding is; at most it knows what it's been told, and in the
case of a default it probably hasn't been told anything. I'd argue that
SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
ought to be: ie, it's an unspecified 8-bit encoding (and from there
it's not much of a jump to deciding to treat it as LATIN1, if you're
forced to do conversion to Unicode or whatever). Certainly, seeing
SQL_ASCII from the server is not license to throw away data, which is
what JDBC is doing now.

> PS. Note that if multibyte is enabled, the functionality that is being
> complained about here in the jdbc client is apparently ok for the server
> to do. If you insert a value into a text column on a SQL_ASCII database
> with multibyte enabled and that value contains 8bit characters, those
> 8bit characters will be quietly replaced with a dummy character since
> they are invalid for the SQL_ASCII 7bit character set.

I have not tried it, but if the backend does that then I'd argue that
that's a bug too. To my mind, a MULTIBYTE backend operating in
SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
transparent pass-through of characters with the high bit set. But I'm
not a multibyte guru. Comments anyone?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vince Vielhaber 2001-05-05 15:22:35 typo in psql's help
Previous Message David McWherter 2001-05-05 15:20:20 Re: GiST indexing problems...

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tatsuo Ishii 2001-05-06 07:47:11 Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Previous Message Seema Noor 2001-05-05 04:25:10 Re: rpm jdbc installation