SUMMARY: Add support for non-default charset encodings to Postgres JDBC driver. SUBMITTED BY: William Webber VERSION PATCHED: Postgresql version 7.0.2 PATCH APPLICATION: From the directory postgresql-7.0.2/src/interfaces, run: patch -p0 < jdbc-charset.patch DESCRIPTION OF PATCH: Although the Postgres backend has support for multi-byte character sets, there is currently no way to specify any character set other than Java's platform default character set (standardly, Latin1) for use by the JDBC driver when sending strings to and receiving them from the database. Characters with Unicode values higher than 0xff (that is, not representable in a single byte) are therefore mangled to '?'. There is no explicit method or parameter in the JDBC API to specify the character set to use for communicating with the database. However, Sun has added the "charSet" Connection property to the 1.2 JDBC/ODBC bridge, to act as a mechanism for setting the character encoding (see http://java.sun.com/products//jdk/1.2/docs/guide/jdbc/bridge.html). This patch adds this functionality to the Postgres JDBC driver. Once applied, this patch will allow the JDBC driver to be used transparently with languages using multi-byte character set encodings and with Unicode (UTF-8). USAGE EXAMPLE: { ... Properties info = new Properties(); info.put("user", "foo"); info.put("password", "bar"); info.put("charSet", "utf-8"); Connection connection = DriverManager.getConnection(url, info); } connection and all statements derived from it can now be used with arbitrary Unicode strings. COMPATIBILITY: As mentioned, this patch complies with the usage adopted by Sun in their 1.2 JDBC/ODBC bridge. It works with JDBC versions 1 and 2. The API has not been changed and no existing code will be affected. OTHER CONSIDERATIONS: To work properly, the database should have been set up to use the same encoding as JDBC with the "-E " argument to createdb. This patch will not check for that. (In many cases, especially if UTF-8 is the encoding used by JDBC, things more or less work anyway.) DOCUMENTATION: Javadoc documentation has been included with the patch. TESTING: This patch has been tested successfully with the UTF-8 encoding and strings containing the full range of Unicode charactes from 0x0001 to 0xffff. The testing code is included in the patch. The patch has not been tested with other (non-Unicode) character sets.