[Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]]

From: Barry Lind <barry(at)xythos(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]]
Date: 2001-05-31 23:21:01
Message-ID: 3B16D1DD.5020103@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

The following patch for JDBC fixes an issue with jdbc running on a
non-multibyte database loosing 8bit characters. This patch will cause
the jdbc driver to ignore the encoding reported by the database when
multibyte isn't enabled and use the JVM default in that case.

thanks,
--Barry

-------- Original Message --------
Subject: Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug
with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)
Date: Fri, 25 May 2001 17:12:09 -0700
From: Barry Lind
To: Tatsuo Ishii , tgl(at)sss(dot)pgh(dot)pa(dot)us
References: <3AF74768(dot)8060807(at)xythos(dot)com>
<20010508110249R(dot)t-ishii(at)sra(dot)co(dot)jp> <3AF78113(dot)6080907(at)xythos(dot)com>
<20010509102305C(dot)t-ishii(at)sra(dot)co(dot)jp>

Tatsuo, Tom,

Since the two of you were the only two that seemed to care about this
thread, I am addressing you directly. I want to come to some sort of
resolution. Since it doesn't appear that anything is going to be
changed in the backend code inn 7.2 to address the issue here, I will
submit the attached patch to the jdbc code.

This patch uses the function pg_encoding_to_char(1) to determine that
multibyte is not enabled on the server (as suggested by Tatsuo), and in
that case will use the default JVM character set to convert data from
the backend. This is instead of the current behaviour that will force
all data to 7bit ascii in the non-multibyte case because
getdatabaseencoding() always returns SQL_ASCII for non-multibyte databases.

If I don't hear anything, I will go ahead and submit this patch.

thanks for your help on this issue.

--Barry

Tatsuo Ishii wrote:

>>> Still I don't see what you are wanting in the JDBC driver if
>>> PostgreSQL would return "UNKNOWN" indicating that the backend is not
>>> compiled with MULTIBYTE. Do you want exact the same behavior as prior
>>> 7.1 driver? i.e. reading data from the PostgreSQL backend, assume its
>>> encoding default to the Java client (that is set by locale or
>>> something else) and convert it to UTF-8. If so, that would make sense
>>> to me...
>>
>> My suggestion would be that if the jdbc client was able to determine if
>> the server character set was UNKNOWN (i.e. no multibyte) that it would
>> then use some appropriate default character set to perform conversions
>> to UCS2 (LATIN1 would probably make the most sence as a default). The
>> jdbc driver would perform its existing behavior if the character set was
>> SQL_ASCII and multibyte was enabled (i.e. only support 7bit characters
>> just like the backend does).
>>
>> Note that the user is always able to override the character set used for
>> conversion by setting the charSet property.
>
>
> I see. However I would say we could not change the current behavior
> of the backend until 7.2 is out. It is our policy the we would not
> add/change existing functionalities while we are in the minor release
> cycle.
>
> What about doing like this:
>
> 1. call pg_encoding_to_char(1) (actually any number except 0 is ok)
>
> 2. if it returns "SQL_ASCII", then you could assume that MULTIBYTE is
> not enbaled.
>
> This is pretty ugly, but should work.
>
>> Tom also mentioned that it might be possible for the server to support
>> setting the character set for a database even when multibyte wasn't
>> enabled. That would then allow clients like jdbc to get a value from
>> non-multibyte enabled servers that would be more meaningful than the
>> current SQL_ASCII. If this where done, then the 'UNKNOWN' hack would
>> not be necessary.
>
>
> Tom's suggestion does not sound reasonable to me. If PostgreSQL is not
> built with MULTIBYTE, then it means there would be no such idea
> "encoding" in PostgreSQL becuase there is no program to handle
> encodings. Thus it would be meaningless to assign an "encoding" to a
> database if MULTIBYTE is not enabled.
> --
> Tatsuo Ishii
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>

Attachment Content-Type Size
patch-1.diff text/plain 1.2 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2001-05-31 23:48:29 Re: [Fwd: Patch for MULTIBYTE and SQL_ASCII (was Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)]]
Previous Message Chris Dunlop 2001-05-31 14:54:41 Australian timezone configure option