Skip site navigation (1) Skip section navigation (2)

Prepared Statement Query Planning

From: Brett Henderson <brett(at)bretth(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Prepared Statement Query Planning
Date: 2009-08-29 02:31:01
Message-ID: 4A9892E5.4010509@bretth.com (view raw or flat)
Thread:
Lists: pgsql-jdbc
Hi,

I'm trying to use a PreparedStatement to query a small number of rows 
from a table containing over 500 million rows.  I'm encountering bad 
query plans due to the query plan being constructed without access to 
the bind variables.

I've read this link which suggests disabling server side prepared 
statements but it doesn't appear to be helping.  I am happy to disable 
all server side prepared statements because I am effectively performing 
batch queries where the cost of a query plan is relatively small 
compared to the execution of the query.
http://jdbc.postgresql.org/documentation/83/server-prepare.html

I've added "prepareThreshold=0", and "prepareThreshold=3" to the 
connection URL but neither appear to have any effect.  I've checked the 
prepare threshold on the statement objects and found that they are being 
set to the values I specify.

Is there any way of verifying what type of statement is being issued to 
the server?

This is the query:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN (
SELECT id, version FROM
nodes WHERE timestamp > ? AND timestamp <= ?
) t ON e.id = t.id AND e.version = t.version
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id

The node table contains approx 500 million rows, the user table approx 
100 thousand rows, and the changeset table somewhere in between but 
closer to the lower end.  The server has 32 GB of RAM so can fit smaller 
tables in RAM, but not the node table.

The query retrieves all rows within a timestamp range.  I realise the 
query could be re-written without the sub-select, but it is implemented 
in this way as a result of the way the query is dynamically constructed 
and allows the sub-select portion can be switched out for other row 
selection criteria.

The timestamp range varies anywhere from 1 minute to 1 day, but no 
larger than that.  The table contains data over a period of several years.

The full JDBC trace with loglevel=2 is below.  Is it possible to tell 
whether server side prepared statements are being used from this?  Note 
that this has been obtained from a windows laptop running PostgreSQL 
8.3.5, but the real problem is occurring on an Ubuntu Linux server 
running PostgreSQL 8.3.7.  The same JDBC driver is being used across the 
board.
17:48:46.077 (1) PostgreSQL 8.3 JDBC4 with SSL (build 603)
17:48:46.091 (1) Trying to establish a protocol version 3 connection to 
localhost:5432
17:48:46.204 (1)  FE=> StartupPacket(user=osm, database=api06_test, 
client_encoding=UNICODE, DateStyle=ISO, extra_float_digits=2)
17:48:46.382 (1)  <=BE AuthenticationReqMD5(salt=xxxxxxxx)
17:48:46.391 (1)  FE=> 
Password(md5digest=md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
17:48:46.397 (1)  <=BE AuthenticationOk
17:48:46.408 (1)  <=BE ParameterStatus(client_encoding = UNICODE)
17:48:46.408 (1)  <=BE ParameterStatus(DateStyle = ISO, DMY)
17:48:46.409 (1)  <=BE ParameterStatus(integer_datetimes = off)
17:48:46.409 (1)  <=BE ParameterStatus(is_superuser = on)
17:48:46.409 (1)  <=BE ParameterStatus(server_encoding = UTF8)
17:48:46.409 (1)  <=BE ParameterStatus(server_version = 8.3.5)
17:48:46.409 (1)  <=BE ParameterStatus(session_authorization = osm)
17:48:46.409 (1)  <=BE ParameterStatus(standard_conforming_strings = off)
17:48:46.409 (1)  <=BE ParameterStatus(TimeZone = Australia/Canberra)
17:48:46.409 (1)  <=BE BackendKeyData(pid=8292,ckey=224285055)
17:48:46.409 (1)  <=BE ReadyForQuery(I)
17:48:46.409 (1)     compatible = 8.3
17:48:46.409 (1)     loglevel = 2
17:48:46.409 (1)     prepare threshold = 0
17:48:46.442 (1) simple execute, 
handler=org(dot)postgresql(dot)jdbc2(dot)AbstractJdbc2Statement$StatementResultHandler(at)1b8e059, 
maxRows=0, fetchSize=10000, flags=9
17:48:46.442 (1)  FE=> Parse(stmt=S_1,query="BEGIN",oids={})
17:48:46.443 (1)  FE=> Bind(stmt=S_1,portal=null)
17:48:46.443 (1)  FE=> Execute(portal=null,limit=0)
17:48:46.443 (1)  FE=> Parse(stmt=S_2,query="SELECT e.id, e.version, 
e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, 
e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN 
changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON 
c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= $2 ORDER BY 
e.id, e.version",oids={0,0})
17:48:46.444 (1)  FE=> Describe(statement=S_2)
17:48:46.444 (1)  FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01 
10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>)
17:48:46.445 (1)  FE=> Execute(portal=C_3,limit=10000)
17:48:46.445 (1)  FE=> Sync
17:48:46.636 (1)  <=BE ParseComplete [S_1]
17:48:46.636 (1)  <=BE BindComplete [null]
17:48:46.636 (1)  <=BE CommandStatus(BEGIN)
17:48:46.636 (1)  <=BE ParseComplete [S_2]
17:48:46.636 (1)  <=BE ParameterDescription
17:48:46.637 (1)  <=BE RowDescription(10)
17:48:46.638 (1)  <=BE BindComplete [C_3]
17:48:46.638 (1)  <=BE CommandStatus(SELECT)
17:48:46.650 (1)  <=BE ReadyForQuery(T)
17:48:46.651 (1) simple execute, 
handler=org(dot)postgresql(dot)jdbc2(dot)AbstractJdbc2Connection$TransactionCommandHandler(at)7b6889, 
maxRows=0, fetchSize=0, flags=22
17:48:46.651 (1)  FE=> CloseStatement(S_2)
17:48:46.651 (1)  FE=> ClosePortal(C_3)
17:48:46.651 (1)  FE=> Parse(stmt=S_4,query="COMMIT",oids={})
17:48:46.651 (1)  FE=> Bind(stmt=S_4,portal=null)
17:48:46.651 (1)  FE=> Execute(portal=null,limit=1)
17:48:46.651 (1)  FE=> Sync
17:48:46.652 (1)  <=BE CloseComplete
17:48:46.652 (1)  <=BE CloseComplete
17:48:46.652 (1)  <=BE ParseComplete [S_4]
17:48:46.652 (1)  <=BE BindComplete [null]
17:48:46.652 (1)  <=BE CommandStatus(COMMIT)
17:48:46.652 (1)  <=BE ReadyForQuery(I)
17:48:46.652 (1)  FE=> Terminate

If it helps, the following blog entry includes two query plan diagrams 
with and without bind variables.
http://www.odecee.com.au/blogs/?p=134

Any suggestions on what I'm doing wrong?

Brett


Responses

pgsql-jdbc by date

Next:From: Віталій ТимчишинDate: 2009-08-29 06:27:56
Subject: Re: Prepared Statement Query Planning
Previous:From: Maciek SakrejdaDate: 2009-08-28 05:50:23
Subject: Re: PGStream synchronization

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group