Performance analysis of server-parsed PreparedStatements ...

From: James Robinson <jlrobins(at)socialserve(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Performance analysis of server-parsed PreparedStatements ...
Date: 2004-04-16 18:26:47
Message-ID: 9F1FF8C6-8FD3-11D8-9377-000A9566A412@socialserve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

I began doing various benchmarks with JBoss 3.2.3 maintaining a pool of
prepared statements on postgres 7.4.2, with a few hacks to the JDBC
driver to ultimately get performance higher than without any pooled
prepared statements or server-side preparation. Things learned from
this endeavor should be of users of JBoss + PG, the JDBC driver
implementors, and backend coders.

The benchmarking was done with a subset of our EJB app's regression
test suite.

All JDBC hacks were done in AbstractJdbc1Statement.

Hacks:

1) Our primary keys are almost exclusively int8s. Tom's cross-integral
index operator fix for 7.5 would reduce the count of local hacks in the
JDBC driver to pull this off. Those hacks involve explicitly tacking on
'::INT8' to the value string in setLong(int, long) and setObject(int,
Object, int, int)'s "CASE Types.BIGINT" clause. Without this you get
the horse-beaten-to-death issue of sequential scan instead of index
scan for non-server prepared queries problem.

2) JBoss pays attention to the # rows updated count from update /
insert statements. Server prepared queries currently don't return the
updated row count. The asked-for protocol update would allow for this
value to be returned through JDBC regardless of prepared or not. So,
only server prepare if it is a select statement.

3) JBoss can (and will, depending on configuration settings + contents
of various runtime caches) issue rather stupidly large queries along
the lines of "select * from table where (id = 1) or (id = 2) or ... (id
= 750). Yep -- 752 node OR grouping. I chose to skip auto-server
preparation if the query had more than N parameters, with N currently
set to 30, since those would most likely never get reused again.

4) When server preparing a query, initialize a counter to count how
many times this prepared statement was reused. So, executing it twice
would yield a reuse count of 1. Print out this count + the SQL query
string when the prepared statement is closed.

Hacks #2 and #3 amounted to:

/** threshold for using a server-side prepared query -- never prepare
if more than this. */
private final static int MAX_ARGS_TO_PREPARE = 30;

public AbstractJdbc1Statement (BaseConnection connection, String
p_sql) throws SQLException
{
this.connection = connection;
parseSqlStmt(p_sql); // this allows Callable stmt to override
if(isSingleSelect() && m_bindTypes.length < MAX_ARGS_TO_PREPARE)
setUseServerPrepare(true);
}

I performed 3 sets of tests:

1) No server-preparation at all, no JBoss statement pooling (vanilla
setup).
2) Server preparation for single-select statements, but no JBoss
statement pooling.
3) Server preparation for single-select statements, JBoss pool of
prepared statements set to 500.

Each test run started with the same initial database state, including
all vacuumed and analyzed. JBoss restarted every time to have it start
with an empty EJB cache.

Here are the results (minutes:seconds), with high and low run removed:

Suite #1 (vanilla):
4:39.57
4:41.21
4:47.09

Suite #2 (server prepared but not jboss cached):
4:38.63
5:00.09
5:30.89

Suite #3 (server prepared, jboss cached):
3:53.79
3:56.74
3:55.50

Analysis:

Suite #2 performed the worst. I expect it to be due to the overhead of
closing the prepared statement right after using it (most likely
exactly once), requiring the close operation to issue an additional
jboss -> postgres round trip cost for the DEALLOCATE command. The
standard deviation between runs is also much higher -- perhaps
additional java GC overhead or the additional context switch jitters.
It is extremely unlikely for the jboss 3.2 series CMP layer to reuse a
preparedstatement within a single method. This can be easily seen from
the prepared statement names, each statement being named via an
auto-incrementing counter in the JDBC driver. A suite #2 run ends up
preparing + closing 11,848 single select statements, and of those,
absolutely none of them were reused within the lifetime of the prepared
statement.

Suite #3 performed the best, ~30 seconds better than 'vanilla'.
Attached are the reuse counters (along with the query that was reused),
sorted numerically by the reuse count. Many of the statements get
reused quite a bit, but they are relatively simple -- we don't do many
joins with more than 4 tables anywhere in our application, and looks
like none more than 2 or 3 tables in this test suite. So, combined with
pooled prepared statements, prepared statements *might* be a winner for
an EJB application. JBoss issues many individual queries, but reducing
them to a distinct set even across method calls brought them down by a
factor of 20, indicating a high hit rate in the jboss prepared
statement cache for this particular workload (for select statements,
anyway).

Given these results, I would recommend *not* turning on auto-server
preparation in the JDBC driver for any sort of query, be they single
select statements or updates, when the client -> backend protocol can
support returning the proper row update count, since it can definitely
impact performance of applications today.

The 'prepare on passing threshold' patch would be great here, since it
would have effectively made batch #2 equivalent to batch #1 (vanilla)
-- no additional cost when PreparedStatements are not cached + reused.
It would also enable performance just slightly worse than case #3 if
the threshold were set to something like 3 or 4 when also using the
JBoss statement cache, since the first 3 or so invocations would be
non-prepared, then auto-switching to server-prepared when it really
looked like this was a non-unique query (such as the 734-node OR
queries).

James

Attachment Content-Type Size
auto-prepared-and-jboss-pooled-statement-reusecounts.txt text/plain 15.0 KB

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tao Yang 2004-04-16 21:29:39 Out of memory exception
Previous Message kenchen077 2004-04-15 19:23:37 How do handle concurrent DML operations