Re: getTables() doesn't handle umlauts correctly

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: getTables() doesn't handle umlauts correctly
Date: 2010-11-23 14:22:10
Message-ID: icgimh$46u$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Kris Jurka, 23.11.2010 09:13:
> As the discussion has shown, trying to determine who is at fault here
> is not trivial. The best way to show that postgresql (driver or
> server if you're seeing it in pgadmin too) is at fault is to create a
> test case creating the table and then querying the metadata. It would
> be helpful to use either a Java or PG escape code for the special
> character so it doesn't get mangled by either mail clients or build
> environments. Then use String.codePointAt to print out the actual
> data for both the table name used for construction and returned by
> the metadata. That would conclusively show that PG is at fault
> somewhere.

OK, this is my test program:

Connection con = DriverManager.getConnection("jdbc:postgresql://localhost:5432/postgres", "postgres", "postgres");
Statement stmt = con.createStatement();

stmt.executeUpdate("create table umlaut_ö (some_data varchar(10))");
stmt.executeUpdate("insert into umlaut_ö (some_data) values ('öäü')");

ResultSet rs = con.getMetaData().getTables(null, "public", "umlaut%", null);
if (rs.next()) {
String name = rs.getString("TABLE_NAME");
System.out.println("table name: " + name);
System.out.print(" codepoints:");
for (int i = 0; i < name.length();)
{
int cp = name.codePointAt(i);
System.out.print(" " + cp);
i += Character.charCount(cp);
}
System.out.println("");
}
rs.close();

rs = stmt.executeQuery("select count(*) from umlaut_ö where some_data = 'öäü'");
if (rs.next()) {
int count = rs.getInt(1);
System.out.println("number of rows: " + count);
}
rs.close();

rs = stmt.executeQuery("select some_data from umlaut_ö");
if (rs.next()) {
String data = rs.getString(1);
System.out.println("data: " + data);
System.out.print(" codepoints:");
for (int i = 0; i < data.length();)
{
int cp = data.codePointAt(i);
System.out.print(" " + cp);
i += Character.charCount(cp);
}
System.out.println("");
}
rs.close();

stmt.executeUpdate("drop table umlaut_ö");

stmt.close();
con.close();

The output on my computer is:

table name: umlaut_test_�
codepoints: 117 109 108 97 117 116 95 116 101 115 116 95 65533
number of rows: 1
data: öäü
codepoints: 246 228 252

So it seems that the umlauts in the table name are returned with a different encoding than the data itself.

Nevertheless the umlauts when being *sent* to the server are always treated correctly (as part of a table name as well as column values)

This is with 9.0.1 on Windows XP using postgresql-9.0-801.jdbc4.jar

Regards
Thomas

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Radosław Smogura 2010-11-23 14:31:03 Re: TypeInfoCache.getPGArrayElement - determine if array
Previous Message Kris Jurka 2010-11-23 08:13:42 Re: getTables() doesn't handle umlauts correctly