Skip site navigation (1) Skip section navigation (2)

Re: Prepared Statements and large where-id-in constant blocks?

From: James Robinson <jlrobins(at)socialserve(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org, pgsql-sql(at)postgresql(dot)org
Subject: Re: Prepared Statements and large where-id-in constant blocks?
Date: 2004-04-20 15:35:39
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-jdbcpgsql-sql
On Apr 19, 2004, at 10:57 PM, Oliver Jowett wrote:

> Unfortunately a bit of experimentation indicates that the planner 
> doesn't do anything clever with ANY + constant array values (at least 
> in 7.4.1 which is what I have to hand):

Not only that, but it seems to get planned only as an index scan. 
Preparing the statement
using "SELECT ... WHERE id = ANY (?)" plus a call to a really-hacked up 
version of
setObject() would solve the issue of getting better use out of fewer 
cached prepared
statements, but the only-sequential scan planning would be a downer.

And while "OR (id=N)" plans exactly like "id IN (N)", there seems to be 
nothing really
worth doing. I examined plans comparing a 6-way join used in our 
production code with
4 ids in the tail "OR (id=N)" nodes, then with the full 723 ids, and 
the plans were markedly
different, preferring sequential scans for many of the intermediate 
table joins in the 723-id
case.The runtimes for the sequential scans were faster than forcing 
index scans, so
the planner's voodoo definitely benefits from full knowledge of how 
many rows should
be expected (which appears crystal clear in hindsight). So, I doubt 
that any single server-side
preparation using a single parameter representing an entire collection 
of ids could perform
as well as it does currently with full information.

Oliver, I tested your proposal of providing 
more-id-params-than-necessary, passing in a
dummy value (-1) which will never be found as a pk in that table, and 
planner handled it efficiently. The repeated instances of "or" were, when not
pre-planned using PREPARE, were nicely crunched down to a single index 
clause of " OR (id= -1::BIGINT)". But, when transforming this to 
pairs, the planner cannot crunch the plan down, since it has no idea 
that, say, 500 of the
700 params will all be the same, so it cannot factor them out at 
planning time (hence, I guess
the warning about constant values in the notes section of the manual 
page for PREPARE).

All roads seem to lead to don't attempt to change a thing -- there is 
no easy or medium
difficulty way to better solve this.

In writing this, I went so far as to think about shunting the list of 
ids into a temporary table
to join off of. Doing this at the SQL-level would be far too costly in 
round-trip times, but
could the "WHERE id in (A, B, C, ...)" form somehow be transformed into 
a hashjoin
operation on the backend when the size of the static set is 'high'? 
Could this not perform
(theoretically) better than what appears to be an O(N) index condition 
evaluation? I am
asking only for personal edification -- I have no sample live query 
where the index condition
solution performs too slowly. In Java-land, if we suppose that the size 
of the set could potentially
be 'large', we quickly defer to containing the values in a HashSet if 
we're going to test for
membership as opposed to performing selection searches on a list. 
Probably a dead-horse
beaten elsewhere.

Many Thanks.

James Robinson

In response to

pgsql-sql by date

Next:From: H.J. SandersDate: 2004-04-20 19:14:48
Subject: transaction
Previous:From: Rod TaylorDate: 2004-04-20 15:32:03
Subject: Re: Can someone tell me why this statement is failing?

pgsql-jdbc by date

Next:From: Dave CramerDate: 2004-04-20 16:46:16
Subject: Re: slow seqscan
Previous:From: Oliver JowettDate: 2004-04-20 05:49:17
Subject: Re: binary data in `bytea' column

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group