Re: Very strange Error in Updates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Very strange Error in Updates
Date: 2004-07-15 14:16:17
Message-ID: 20559.1089900977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

Oliver Jowett <oliver(at)opencloud(dot)com> writes:
> The JDBC driver always speaks UNICODE when it can, since that matches
> Java's internal string representation. I suspect that what's happening is:

> 0) the driver sets client_encoding = UNICODE during connection setup

Right.

> 1) the driver encodes the parameter as UNICODE (== UTF8); for characters
> above 127 this encoding will result in more than one byte per character.

Right.

> 2) the server converts from client_encoding UNICODE to database encoding
> SQL_ASCII; for characters that are invalid in SQL_ASCII (>127) it does
> some arbitary conversion, probably just copying the illegal values
> unchanged.

Not really. SQL_ASCII encoding basically means "we don't know what this
data is, just store it verbatim". So the UTF-8 string sent by the
driver is stored verbatim.

> 3) you end up with extra characters in the resulting value which exceeds
> the varchar's size.

Right. Since the server does not know what encoding is in use, it falls
back to the assumption that 1 character == 1 byte, under which
assumption the string violates the varchar(30) constraint.

Had the server known which encoding was in use, it would have counted
the characters correctly.

> The solution is to use a database encoding that matches your data.

Actually, if you intend to access the database primarily through JDBC,
it'd be best to use server encoding UNICODE. The JDBC driver will
always want UNICODE on the wire, and I see no reason to force extra
character set conversions. Non-UNICODE-aware clients can be handled by
setting client_encoding properly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-07-15 14:32:01 Re: [PATCHES] serverlog rotation/functions
Previous Message Zeugswetter Andreas SB SD 2004-07-15 13:00:24 Re: Point in Time Recovery

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dario V. Fassi 2004-07-15 16:39:58 Re: Very strange Error in Updates
Previous Message Stefano Bonnin 2004-07-15 14:11:35 Re: SSL Problem