Re: Postgresql JDBC UTF8 Conversion Throughput

From: Paul Lindner <lindner(at)inuus(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Postgresql JDBC UTF8 Conversion Throughput
Date: 2008-09-19 18:36:13
Message-ID: 223E2A3B-A4BE-4EC3-A598-0079A2965334@inuus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

> From: Kris Jurka <books(at)ejurka(dot)com>
> Date: September 19, 2008 12:29:45 AM PDT
> To: Paul Lindner <lindner(at)inuus(dot)com>
> Cc: pgsql-jdbc(at)postgresql(dot)org
> Subject: Re: Postgresql JDBC UTF8 Conversion Throughput
>
>
>
> On Mon, 2 Jun 2008, Paul Lindner wrote:
>
>> It turns out the using more than two character sets in your Java
>> Application causes very poor throughput because of synchronization
>> overhead. I wrote about this here:
>>
>> http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performance.html
>>
>
> Very interesting.
>
>> In Java 1.6 there's an easy way to fix this charset lookup problem.
>> Just create a static Charset for UTF-8 and pass that to getBytes(...)
>> instead of the string constant "UTF-8".
>
> Note that this is actually a performance hit (when you aren't stuck
> doing charset lookups), see
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6633613
>
>> For backwards compatibility with Java 1.4 you can use the attached
>> patch instead. It uses nio classes to do the UTF-8 to byte
>> conversion.
>>
>
> This is also a performance loser in the simple case. The attached
> test case shows times of:
>
> Doing 10000000 iterations of each.
> 2606 getBytes(String)
> 6200 getBytes(Charset)
> 3346 via ByteBuffer
>
> It would be nice to fix the blocking problem, but it seems like a
> rather unusual situation to be in (being one charset over the two
> charset cache). If you've got more than three charsets in play then
> fixing the JDBC driver won't help you because at most it could
> eliminate one. So I'd like the driver to be a good citizen, but I'm
> not convinced the performance hit is worth it without having some
> more field reports or benchmarks.
>
> Maybe it depends how much reading vs writing is done. Right now we
> have our own UTF8 decoder so this hit only happens when encoding
> data to send it to the DB. If you're loading a lot of data this
> might be a problem, but if you're sending a small query with a
> couple of parameters, then perhaps the thread safety is more
> important.
>

Hi Kris,

getBytes(String) when using a constant string will always win.
StringCoding.java (see http://www.docjar.net/html/api/java/lang/StringCoding.java.html)
caches the charset locally.

When you use 2 or more character sets getBytes(Charset) and
getBytes(String) single-thread performance are about the same with
getBytes(String) slightly ahead. ByteBuffer ends up being the big
winner:

Doing 10000000 iterations of each for string - 'abcd1234'
15662 getBytes(Charset)
14958 getBytes(String)
10098 via ByteBuffer

In any case all of this only pertains to single thread performance.
Our web apps are running on 8 and 16 core systems where contention is
the biggest performance killer.

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2008-09-19 22:55:08 Re: Encoding issues
Previous Message Kris Jurka 2008-09-19 07:29:45 Re: Postgresql JDBC UTF8 Conversion Throughput