Skip site navigation (1) Skip section navigation (2)

Re: 8.0.0beta4: "copy" and "client_encoding"

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: mbch67(at)yahoo(dot)com
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 8.0.0beta4: "copy" and "client_encoding"
Date: 2004-11-05 21:41:43
Message-ID: 418BF397.4070000@opencloud.com (view raw or flat)
Thread:
Lists: pgsql-jdbc
mbch67(at)yahoo(dot)com wrote:

> 1. I set LATIN1 as the database (postgresql.conf) default client
> encoding. Why does COPY, executed via JDBC not use the right encoding?
> => To me it seems to be a backend problem. Should this be address in
> another posting list?

The postgresql.conf setting is a default that can be overridden on a 
per-client basis. JDBC overrides the default when it connects. This is 
normal.

> 2. Was the decision to disable the "SET CLIENT_ENCODING" command
> really a good idea? What about if I am running a server using UNICODE
> to store text, my default client encoding is LATIN1 and I want to
> import a Korean encoded text file using COPY via JDBC? There is no way
> to tell COPY what encoding the input file based on.
> In order to be compliant with PSQL I suggest to reactivate the
> disabled "SET CLIENT ENCODING" for JDBC.

It's a good idea in the sense that if you SET CLIENT_ENCODING, you will 
break the JDBC driver in nonobvious ways. The check is there as an extra 
safety net. I'd be OK with a URL parameter to disable the check so that 
expert users can SET CLIENT_ENCODING at their own risk, but I don't want 
the check disabled by default.

It would be theoretically possible for the JDBC driver to track 
client_encoding and adjust the encoding it uses accordingly, but:

1) someone needs to actually implement that
2) it is not clear exactly when the encoding changes with respect to 
receiving the ParameterStatus message (this is only an issue if there 
are encodings where the contents of the ParameterStatus message would 
change in the new encoding)
3) it results in an extra round of transcoding (i.e. db encoding -> 
client encoding -> unicode, rather than just db encoding -> unicode)

Given that the only thing that we've seen that depends on 
client_encoding so far is COPY (and even that has problems), I think the 
right solution is to fix COPY, not go to a lot of extra work to support 
arbitary client_encoding values.

Are there any other cases where client_encoding needs to be modified by 
a JDBC user? It really seems to me that client_encoding is an 
implementation detail that JDBC users should not need to worry about, 
given that Java already has standard mechanisms for dealing with 
encodings (namely "turn everything into unicode strings internally").

====

Also, a couple of workarounds for your case that don't need driver 
modifications:

- force use of protocol version 2 by adding "?protocolVersion=2" to your 
connection URL; you will lose the benefits of version 3 but it should 
also defeat the client_encoding checks.
- transcode the file from LATIN1 to UNICODE (UTF8) on the server side 
before issuing the COPY.

-O

In response to

pgsql-jdbc by date

Next:From: Barry LindDate: 2004-11-06 05:21:47
Subject: Re: 8.0.0beta4: "copy" and "client_encoding"
Previous:From: Kris JurkaDate: 2004-11-05 20:30:26
Subject: Re: Name Lookup Weirdness

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group