Quick Links

Re: Postgresql JDBC UTF8 Conversion Throughput

From:	Kris Jurka <books(at)ejurka(dot)com>
To:	Paul Lindner <lindner(at)inuus(dot)com>
Cc:	pgsql-jdbc(at)postgresql(dot)org
Subject:	Re: Postgresql JDBC UTF8 Conversion Throughput
Date:	2008-09-19 07:29:45
Message-ID:	Pine.BSO.4.64.0809190313560.4245@leary.csoft.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-jdbc

On Mon, 2 Jun 2008, Paul Lindner wrote:

> It turns out the using more than two character sets in your Java
> Application causes very poor throughput because of synchronization
> overhead. I wrote about this here:
>
> http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performance.html
>

Very interesting.

> In Java 1.6 there's an easy way to fix this charset lookup problem.
> Just create a static Charset for UTF-8 and pass that to getBytes(...)
> instead of the string constant "UTF-8".

Note that this is actually a performance hit (when you aren't stuck doing
charset lookups), see

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6633613

> For backwards compatibility with Java 1.4 you can use the attached
> patch instead. It uses nio classes to do the UTF-8 to byte
> conversion.
>

This is also a performance loser in the simple case. The attached test
case shows times of:

Doing 10000000 iterations of each.
2606 getBytes(String)
6200 getBytes(Charset)
3346 via ByteBuffer

It would be nice to fix the blocking problem, but it seems like a rather
unusual situation to be in (being one charset over the two charset cache).
If you've got more than three charsets in play then fixing the JDBC driver
won't help you because at most it could eliminate one. So I'd like the
driver to be a good citizen, but I'm not convinced the performance hit is
worth it without having some more field reports or benchmarks.

Maybe it depends how much reading vs writing is done. Right now we have
our own UTF8 decoder so this hit only happens when encoding data to send
it to the DB. If you're loading a lot of data this might be a problem,
but if you're sending a small query with a couple of parameters, then
perhaps the thread safety is more important.

Kris Jurka

Attachment	Content-Type	Size
TimeCharset.java	text/plain	1.0 KB

In response to

Postgresql JDBC UTF8 Conversion Throughput at 2008-06-02 08:57:37 from Paul Lindner

Browse pgsql-jdbc by date

	From	Date	Subject
Next Message	Paul Lindner	2008-09-19 18:36:13	Re: Postgresql JDBC UTF8 Conversion Throughput
Previous Message	Kris Jurka	2008-09-19 00:02:24	Re: German translation fixes