From: | Oliver Jowett <oliver(at)opencloud(dot)com> |
---|---|
To: | "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org> |
Subject: | patch: support unicode characters above U+10000 |
Date: | 2004-08-08 23:16:09 |
Message-ID: | 4116B439.60207@opencloud.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-jdbc |
This patch adds support for translating UTF-8 representations of unicode
characters above U+10000 into UTF-16 surrogate pairs. Once the server
supports these characters (see recent discussion on -hackers), the
driver should be able to process them without problems (in theory..).
This translation behaviour is the same as what (at least) 1.4 does when
decoding UTF-8 via a String ctor. To actually handle the resulting
surrogate pairs properly throughout the system you need a 1.5 JDK. See
http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ for
some background.
I also added checks for illegal encodings in the decoder, and added more
testcases for the decoder since I've broken it once before..
Along the way I did some microbenchmarking of the decoder against 1.4.2
client and server JVMs. It's still substantially faster to use our own
decoder here rather than use the String ctor (factor of 2 difference).
The new checks for illegal encodings add about a 10-15% overhead.
-O
Attachment | Content-Type | Size |
---|---|---|
pgjdbc-support-high-unicode.txt | text/plain | 16.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexandre Aufrere | 2004-08-09 10:09:17 | Re: Problems with charsets, investigated... |
Previous Message | Jose Miguel Madinaveitia Ramirez | 2004-08-08 18:32:26 | Re: Problems with big tables. |