Re: new String(byte[]) performance

From: Barry Lind <barry(at)xythos(dot)com>
To: Teofilis Martisius <teo(at)teohome(dot)lzua(dot)lt>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: new String(byte[]) performance
Date: 2002-10-20 03:02:37
Message-ID: 3DB21CCD.1030108@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc


Teofilis,

I have applied this patch. I also made the change that so that when
connected to a 7.3 database this optimization will always be used. This
is done by having the server do the character set encoding/decoding and
always using UTF-8 when dealing with the jdbc client.

thanks,
--Barry

Teofilis Martisius wrote:
> Hello,
>
> While looking through postgresql JDBC driver sources and profiling, I
> noticed that the driver uses new String(byte[]) a lot while iterating a
> ResultSet. And I noticed that this String constructor takes a lot of
> time. I wrote a custom byte[]->String conversion method for UTF-8 that
> speeds up iterating over ResultSet 2 times or even more. I have a patch
> for PostgreSQL JDBC drivers, but well, this is a workaround and I am not
> sure it gets accepted. It does speed things up quite a noticable amount.
>
> Hmm, maybe decodeUTF8() should be synchronized on cdata, or maybe cdata
> should be allocated for each call. static cdata version was faster.
>
> By the way. What should a JDBC driver do when f.e. ResultSet.getInt() is
> called for a VARCHAR field? I would suggest converting byte arrays to
> Strings or even to more precisely typed values (Integers, Doubles and so
> on) on QueryExecutor().execute(). This should save some RAM allocation
> for receiveTuple, because now memory gets allocated several times- once
> for byte[], and second time for String, and third time for Integer or
> other object in getObject(). Memory allocation takes a considerable
> amount of time. But this stronger typing would remove some of
> flexibility to any getXXX for any SQL type field. And it would probably
> make the querying itself (QueryExecutor.execute() slower, i don't know
> :/
>
> Teofilis Martisius
>
> Anyway, here is the patch to fix string decoding:
>
> diff -r -u ./org/postgresql/core/Encoding.java /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java
> --- ./org/postgresql/core/Encoding.java 2001-11-20 00:33:37.000000000 +0200
> +++ /usr/src/postgresql-7.2.2fixed/src/interfaces/jdbc/org/postgresql/core/Encoding.java 2002-09-11 15:56:10.000000000 +0200
> @@ -155,6 +155,9 @@
> }
> else
> {
> + if (encoding.equals("UTF-8")) {
> + return decodeUTF8(encodedString, offset, length);
> + }
> return new String(encodedString, offset, length, encoding);
> }
> }
> @@ -163,6 +166,43 @@
> throw new PSQLException("postgresql.stream.encoding", e);
> }
> }
> + /**
> + * custom byte[] -> String conversion routine, 3x-10x faster then standard new String(byte[])
> + */
> + static final int pow2_6 = 64; // 2^6
> + static final int pow2_12 = 4096; // 2^12
> + static char cdata[] = new char[50];
> +
> + public static final String decodeUTF8(byte data[], int offset, int length) {
> + if (cdata.length < (length-offset)) {
> + cdata = new char[length-offset];
> + }
> + int i = offset;
> + int j = 0;
> + int z, y, x, val;
> + while (i < length) {
> + z = data[i] & 0xFF;
> + if (z < 0x80) {
> + cdata[j++] = (char)data[i];
> + i++;
> + } else if (z >= 0xE0) { // length == 3
> + y = data[i+1] & 0xFF;
> + x = data[i+2] & 0xFF;
> + val = (z-0xE0)*pow2_12 + (y-0x80)*pow2_6 + (x-0x80);
> + cdata[j++] = (char) val;
> + i+= 3;
> + } else { // length == 2 (maybe add checking for length > 3, throw exception if it is
> + y = data[i+1] & 0xFF;
> + val = (z - 0xC0)* (pow2_6)+(y-0x80);
> + cdata[j++] = (char) val;
> + i+=2;
> + }
> + }
> +
> + String s = new String(cdata, 0, j);
> + return s;
> + }
> +
>
> /*
> * Decode an array of bytes into a string.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Przemyslaw Wegrzyn 2002-10-20 18:05:44 Scrollable result sets
Previous Message Barry Lind 2002-10-20 02:41:29 Re: /contrib/retep to gborg