Re: 8.0.0beta4: "copy" and "client_encoding"

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: mbch67(at)yahoo(dot)com
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 8.0.0beta4: "copy" and "client_encoding"
Date: 2004-11-05 21:41:43
Message-ID: 418BF397.4070000@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

mbch67(at)yahoo(dot)com wrote:

> 1. I set LATIN1 as the database (postgresql.conf) default client
> encoding. Why does COPY, executed via JDBC not use the right encoding?
> => To me it seems to be a backend problem. Should this be address in
> another posting list?

The postgresql.conf setting is a default that can be overridden on a
per-client basis. JDBC overrides the default when it connects. This is
normal.

> 2. Was the decision to disable the "SET CLIENT_ENCODING" command
> really a good idea? What about if I am running a server using UNICODE
> to store text, my default client encoding is LATIN1 and I want to
> import a Korean encoded text file using COPY via JDBC? There is no way
> to tell COPY what encoding the input file based on.
> In order to be compliant with PSQL I suggest to reactivate the
> disabled "SET CLIENT ENCODING" for JDBC.

It's a good idea in the sense that if you SET CLIENT_ENCODING, you will
break the JDBC driver in nonobvious ways. The check is there as an extra
safety net. I'd be OK with a URL parameter to disable the check so that
expert users can SET CLIENT_ENCODING at their own risk, but I don't want
the check disabled by default.

It would be theoretically possible for the JDBC driver to track
client_encoding and adjust the encoding it uses accordingly, but:

1) someone needs to actually implement that
2) it is not clear exactly when the encoding changes with respect to
receiving the ParameterStatus message (this is only an issue if there
are encodings where the contents of the ParameterStatus message would
change in the new encoding)
3) it results in an extra round of transcoding (i.e. db encoding ->
client encoding -> unicode, rather than just db encoding -> unicode)

Given that the only thing that we've seen that depends on
client_encoding so far is COPY (and even that has problems), I think the
right solution is to fix COPY, not go to a lot of extra work to support
arbitary client_encoding values.

Are there any other cases where client_encoding needs to be modified by
a JDBC user? It really seems to me that client_encoding is an
implementation detail that JDBC users should not need to worry about,
given that Java already has standard mechanisms for dealing with
encodings (namely "turn everything into unicode strings internally").

====

Also, a couple of workarounds for your case that don't need driver
modifications:

- force use of protocol version 2 by adding "?protocolVersion=2" to your
connection URL; you will lose the benefits of version 3 but it should
also defeat the client_encoding checks.
- transcode the file from LATIN1 to UNICODE (UTF8) on the server side
before issuing the COPY.

-O

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Barry Lind 2004-11-06 05:21:47 Re: 8.0.0beta4: "copy" and "client_encoding"
Previous Message Kris Jurka 2004-11-05 20:30:26 Re: Name Lookup Weirdness