Re: PreparedStatement parameters and mutable objects

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: Paul Thomas <paul(at)tmsl(dot)demon(dot)co(dot)uk>, "pgsql-jdbc (at) postgresql " "(dot) org" <pgsql-jdbc(at)postgresql(dot)org>, Kris Jurka <books(at)ejurka(dot)com>
Subject: Re: PreparedStatement parameters and mutable objects
Date: 2004-01-13 12:34:59
Message-ID: 1073997299.1149.26.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Oliver,

OK, now how do we test your estimates? If it truly shows this kind of
improvement you would sway me!

Dave
On Mon, 2004-01-12 at 21:02, Oliver Jowett wrote:
> Paul Thomas wrote:
> > On 12/01/2004 00:39 Oliver Jowett wrote:
> >> The problem is that it's very specific to the application workload;
> >> what case do we measure?
> >
> > I think you would need something which closely models you app's work load.
> >
> >> The reason I'm interested in doing this for is not the direct CPU
> >> overhead of object creation (none of the JDBC code is on our main
> >> execution path), but the effect that object creation has on GC
> >> interval and pause. We're running a low-latency app where an extra
> >> 50ms pause due to GC has a large impact on our latency figures .. so
> >> the less garbage we generate in the first place, the better. We could
> >> farm the JDBC work out to a separate VM, but that gets complex quite
> >> fast.
> >
> >
> > Reducing the amount of garbage created is never a bad thing. Do you have
> > an estimate of the % reduction you think you could achieve?
>
> See the attached testcase, it does batched inserts of byte arrays. I've
> run this as:
>
> java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \
> -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \
> TestInsertGarbage \
> 'jdbc:postgresql:rhino?user=oliver&password=oliver' \
> 100000 100 100
>
> which does 100k inserts each with a 100 byte array, in batches of 100.
> This is roughly the same as our application workload when it is doing a
> DB resync of a reasonably-sized provisioning database.
>
> In the resulting heap dump, the allocation hotspots are:
>
> QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes
> (100000 byte[], 143738400 bytes)
> (100000 byte[], 37144800 bytes)
> (100000 byte[], 20000000 bytes) // This is encoding the query text
> Statement.setBytes -> setString -> escapeString -> StringBuffer.append
> (63000 char[], 81608000 bytes)
> (19200 char[], 25018400 bytes)
> (17800 char[], 23091200 bytes)
> Statement.setBytes -> setString -> StringBuffer.ensureCapacity
> (100000 char[], 65445600 bytes)
> Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init>
> (100000 char[], 41600000 bytes)
> Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append
> (21800 char[], 17788800 bytes)
> (21400 char[], 17462400 bytes)
> (21300 char[], 17380800 bytes)
> (20800 char[], 16972800 bytes)
> (14400 char[], 11750400 bytes)
> QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet
> (100000 Jdbc3ResultSet, 13600000 bytes)
>
> All other allocation points allocate <10MB. There's quite a bit of other
> object allocation that could be cleaned up, e.g. repeated allocation of
> QueryExecutor objects.
>
> Total allocations were around 631800000 bytes. I estimate that streaming
> the parameter data would generate about 499000000 bytes less garbage
> (basically all of the above goes away except for encoding the query text
> and creating the resultsets), which is a 79% reduction (this is actually
> a lot more than I expected!).
>
> There are improvements we can make here without streaming, though. Not
> creating resultset objects for updates, and doing all necessary escaping
> in a single pass rather than two, both seem like low-hanging fruit.
>
> -O
>
> ______________________________________________________________________
>
> import java.sql.*;
>
> public class TestInsertGarbage {
> public static void main(String args[]) throws Exception {
> if (args.length < 4) {
> System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>");
> return;
> }
>
> String driverURL = args[0];
> int numInserts = Integer.parseInt(args[1]);
> int arraySize = Integer.parseInt(args[2]);
> int batchSize = Integer.parseInt(args[3]);
>
> // We don't want to count the data to be inserted itself, since that's
> // mostly static and isn't garbage. So just create a single array
> // to represent all the data. We'll reuse that on each insert.
>
> // we also give it a random distribution of values since that affects
> // the exact length needed to escape the value.
> byte[] dummyArray = new byte[arraySize];
>
> Class.forName("org.postgresql.Driver");
> Connection conn = DriverManager.getConnection(driverURL);
>
> // Schema setup.
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("DROP TABLE test_insert_garbage");
> } catch (SQLException e) {}
> stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)");
>
> // Testcase itself.
> conn.setAutoCommit(false);
> PreparedStatement inserter =
> conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)");
>
> // Main loop.
> java.util.Random randomizer = new java.util.Random();
> for (int i = 0; i < numInserts; ) {
> randomizer.nextBytes(dummyArray); // Hee hee.
> for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) {
> inserter.setBytes(1, dummyArray);
> inserter.addBatch();
> }
> inserter.executeBatch();
> System.err.println("Inserted " + i + " rows");
> }
>
> // Done.
> conn.commit();
> System.err.println("Committed transaction.");
> }
> }
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Dave Cramer
519 939 0336
ICQ # 1467551

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Anderson dos Santos 2004-01-13 12:39:55 to leave
Previous Message Csaba Nagy 2004-01-13 10:44:28 Re: COPY support implemented