Re: Very strange performance decrease when reusing a PreparedStatement

From: Péter Kovács <maxottovonstirlitz(at)gmail(dot)com>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: Dave Cramer <pg(at)fastcrypt(dot)com>, Frédérik Bilhaut <frederik(dot)bilhaut(at)noopsis(dot)fr>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Very strange performance decrease when reusing a PreparedStatement
Date: 2009-05-04 08:27:02
Message-ID: fdeb32eb0905040127i2215c18cy47600c1ec1605188@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Mon, May 4, 2009 at 1:43 AM, Oliver Jowett <oliver(at)opencloud(dot)com> wrote:
> Péter Kovács wrote:
>> On Sun, May 3, 2009 at 12:10 PM, Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>>>> It appears that the Postgres "server-prepared statement" cannot handle
>>>> parameters to the statement. This is really unfortunate, because 99%
>>>> of real-life applications will want to re-use the same statement
>>>> (template) with different parameters.
>>>>
>>>> The term "server-prepared statement" itself already indicates that
>>>> there may be something skewed about the "local" semantics of
>>>> java.sql.PreparedStatements in the Postgres JDBC driver. There is no
>>>> notion of "client-prepared statement" in the JDBC API, which conceives
>>>> PreparedStatement instances as mere handles to server side objects.
>
> There's no requirement in JDBC that PreparedStatement be merely a handle
> to a server-side object - that's just one implementation option, and a
> poor one to choose for the PostgreSQL wire protocol in its current form.
> At a bare minimum, we cache much of the statement and parameter
> information on the client side. (Can you imagine every setXXX call doing
> a network round-trip, really?)
>
>>>> And indeed, Postgres JDBC users have historically been using
>>>> java.sql.PreparedStatements for its side-effect of preventing SQL
>>>> injection rather than for the purpose the JDBC API designers had in
>>>> mind with this class.
>>> I'm not sure where this hypothesis is coming from. Postgresql server
>>> prepared statements can certainly handle parameters.
>>>
>>> What makes you think it can't ?
>>
>> http://jdbc.postgresql.org/documentation/83/server-prepare.html:
>>
>> "Server side prepared statements are planned only once by the server.
>> This avoids the cost of replanning the query every time, but also
>> means that the planner cannot take advantage of the particular
>> parameter values used in a particular execution of the query. You
>> should be cautious about enabling the use of server side prepared
>> statements globally."
>
> I don't think you're understanding that correctly. It has nothing to do
> with how parameters are passed.
>
> Are you familiar with how the wire protocol works?
>
> If you have a query:
>
>  SELECT * FROM foo WHERE someindexedvalue=?
>
> When the unnamed statement is used, a new query plan is constructed when
> parameters are bound to the statement before execution, and the
> *particular* parameter value provided is given to the query planner.
>
> The query planner then make decisions based on that actual value. For
> example, if the table statistics say that the particular value used is
> very selective for that column, it may decide to use an index scan over
> a table scan.
>
> When a named statement is used, the server behavior changes. The query
> is planned once, when the statement is created but before parameter
> values are bound. The planner does not have particular parameter values
> available, and it creates a general plan suitable for any parameter
> value. This can result in different index selectivity information,
> because the values that will be actually be used are not known ahead of
> time .. which in turn can result in different plans. The resulting plan
> may be "good" for any random value but "bad" for a particular value.
> Often apprications are not passing in any random value - there are
> external constraints on the value that are not visible in the structure
> of the query. For example, maybe you only retrieve log data from that
> 5GB table with a timestamp range that selects a few minutes of data -
> but without specific parameter values, the planner doesn't know that.
>
> "server side prepared statement" is the JDBC driver jargon for "uses a
> persistent, named, statement at the protocol level". The mechanism for
> passing parameters remains the same in both cases - the difference is
> that in one case a one-shot unnamed statement is used, in the other case
> a persistent server-side named statement is used.
>
> The distinction is there precisely because there *ARE* known performance
> issues on the server side with some queries when using a named statements.
>
> There is not much more the driver can do than give you tuning knobs for
> the problematic cases. Anything else is going to have to happen on the
> server side to make the planner smarter in these cases (and I'm not even
> sure if it CAN be smarter with the information it has)

Thank you, Oliver! This clarifies a lot!

Do I understand it correctly that in the case of named statements, the
query planner makes an assumption about the selectivity of the actual
values to create the plan which will be cached and reused? If so,
wouldn't it be possible for the planner to
1. also cache the selectivity estimations used for the cached plan;
2. check the actual selectivity of parameters before each execution;
3. create (and execute) a "temporary plan" for those "unexpected"
parameters whose actual selectivity values differ significantly from
the selectivity values assumed for the cached plan.

Assuming that the cost of looking up actual selectivity is
insignificant compared to the cost of (re)creating a cached query
plan, this could avoid the kind of unexpected performance death this
thread is about. (And let's face it: being smart about optimizing
query execution is a much easier task for the query planner than for
the application developer. The query planner doesn't know the
parameters beforehand, but is at least in possession of detailed
low-level statistics. The application developer (typically) doesn't
know either the parameters beforehand (they are specified by the user)
or the table/index statistics.)

Thanks,
Peter

>
> -O
>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Thomas Kellerer 2009-05-04 09:03:17 Re: Very strange performance decrease when reusing a PreparedStatement
Previous Message Oliver Jowett 2009-05-04 00:43:57 Re: getTiIme/Timestamp with TimeZone inconsistency