Prepared Statement Query Planning

From: Brett Henderson <brett(at)bretth(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Prepared Statement Query Planning
Date: 2009-08-29 02:31:01
Message-ID: 4A9892E5.4010509@bretth.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi,

I'm trying to use a PreparedStatement to query a small number of rows
from a table containing over 500 million rows. I'm encountering bad
query plans due to the query plan being constructed without access to
the bind variables.

I've read this link which suggests disabling server side prepared
statements but it doesn't appear to be helping. I am happy to disable
all server side prepared statements because I am effectively performing
batch queries where the cost of a query plan is relatively small
compared to the execution of the query.
http://jdbc.postgresql.org/documentation/83/server-prepare.html

I've added "prepareThreshold=0", and "prepareThreshold=3" to the
connection URL but neither appear to have any effect. I've checked the
prepare threshold on the statement objects and found that they are being
set to the values I specify.

Is there any way of verifying what type of statement is being issued to
the server?

This is the query:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN (
SELECT id, version FROM
nodes WHERE timestamp > ? AND timestamp <= ?
) t ON e.id = t.id AND e.version = t.version
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id

The node table contains approx 500 million rows, the user table approx
100 thousand rows, and the changeset table somewhere in between but
closer to the lower end. The server has 32 GB of RAM so can fit smaller
tables in RAM, but not the node table.

The query retrieves all rows within a timestamp range. I realise the
query could be re-written without the sub-select, but it is implemented
in this way as a result of the way the query is dynamically constructed
and allows the sub-select portion can be switched out for other row
selection criteria.

The timestamp range varies anywhere from 1 minute to 1 day, but no
larger than that. The table contains data over a period of several years.

The full JDBC trace with loglevel=2 is below. Is it possible to tell
whether server side prepared statements are being used from this? Note
that this has been obtained from a windows laptop running PostgreSQL
8.3.5, but the real problem is occurring on an Ubuntu Linux server
running PostgreSQL 8.3.7. The same JDBC driver is being used across the
board.
17:48:46.077 (1) PostgreSQL 8.3 JDBC4 with SSL (build 603)
17:48:46.091 (1) Trying to establish a protocol version 3 connection to
localhost:5432
17:48:46.204 (1) FE=> StartupPacket(user=osm, database=api06_test,
client_encoding=UNICODE, DateStyle=ISO, extra_float_digits=2)
17:48:46.382 (1) <=BE AuthenticationReqMD5(salt=xxxxxxxx)
17:48:46.391 (1) FE=>
Password(md5digest=md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
17:48:46.397 (1) <=BE AuthenticationOk
17:48:46.408 (1) <=BE ParameterStatus(client_encoding = UNICODE)
17:48:46.408 (1) <=BE ParameterStatus(DateStyle = ISO, DMY)
17:48:46.409 (1) <=BE ParameterStatus(integer_datetimes = off)
17:48:46.409 (1) <=BE ParameterStatus(is_superuser = on)
17:48:46.409 (1) <=BE ParameterStatus(server_encoding = UTF8)
17:48:46.409 (1) <=BE ParameterStatus(server_version = 8.3.5)
17:48:46.409 (1) <=BE ParameterStatus(session_authorization = osm)
17:48:46.409 (1) <=BE ParameterStatus(standard_conforming_strings = off)
17:48:46.409 (1) <=BE ParameterStatus(TimeZone = Australia/Canberra)
17:48:46.409 (1) <=BE BackendKeyData(pid=8292,ckey=224285055)
17:48:46.409 (1) <=BE ReadyForQuery(I)
17:48:46.409 (1) compatible = 8.3
17:48:46.409 (1) loglevel = 2
17:48:46.409 (1) prepare threshold = 0
17:48:46.442 (1) simple execute,
handler=org(dot)postgresql(dot)jdbc2(dot)AbstractJdbc2Statement$StatementResultHandler(at)1b8e059,
maxRows=0, fetchSize=10000, flags=9
17:48:46.442 (1) FE=> Parse(stmt=S_1,query="BEGIN",oids={})
17:48:46.443 (1) FE=> Bind(stmt=S_1,portal=null)
17:48:46.443 (1) FE=> Execute(portal=null,limit=0)
17:48:46.443 (1) FE=> Parse(stmt=S_2,query="SELECT e.id, e.version,
e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name,
e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN
changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON
c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= $2 ORDER BY
e.id, e.version",oids={0,0})
17:48:46.444 (1) FE=> Describe(statement=S_2)
17:48:46.444 (1) FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01
10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>)
17:48:46.445 (1) FE=> Execute(portal=C_3,limit=10000)
17:48:46.445 (1) FE=> Sync
17:48:46.636 (1) <=BE ParseComplete [S_1]
17:48:46.636 (1) <=BE BindComplete [null]
17:48:46.636 (1) <=BE CommandStatus(BEGIN)
17:48:46.636 (1) <=BE ParseComplete [S_2]
17:48:46.636 (1) <=BE ParameterDescription
17:48:46.637 (1) <=BE RowDescription(10)
17:48:46.638 (1) <=BE BindComplete [C_3]
17:48:46.638 (1) <=BE CommandStatus(SELECT)
17:48:46.650 (1) <=BE ReadyForQuery(T)
17:48:46.651 (1) simple execute,
handler=org(dot)postgresql(dot)jdbc2(dot)AbstractJdbc2Connection$TransactionCommandHandler(at)7b6889,
maxRows=0, fetchSize=0, flags=22
17:48:46.651 (1) FE=> CloseStatement(S_2)
17:48:46.651 (1) FE=> ClosePortal(C_3)
17:48:46.651 (1) FE=> Parse(stmt=S_4,query="COMMIT",oids={})
17:48:46.651 (1) FE=> Bind(stmt=S_4,portal=null)
17:48:46.651 (1) FE=> Execute(portal=null,limit=1)
17:48:46.651 (1) FE=> Sync
17:48:46.652 (1) <=BE CloseComplete
17:48:46.652 (1) <=BE CloseComplete
17:48:46.652 (1) <=BE ParseComplete [S_4]
17:48:46.652 (1) <=BE BindComplete [null]
17:48:46.652 (1) <=BE CommandStatus(COMMIT)
17:48:46.652 (1) <=BE ReadyForQuery(I)
17:48:46.652 (1) FE=> Terminate

If it helps, the following blog entry includes two query plan diagrams
with and without bind variables.
http://www.odecee.com.au/blogs/?p=134

Any suggestions on what I'm doing wrong?

Brett

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Віталій Тимчишин 2009-08-29 06:27:56 Re: Prepared Statement Query Planning
Previous Message Maciek Sakrejda 2009-08-28 05:50:23 Re: PGStream synchronization