Re: Very strange performance decrease when reusing a PreparedStatement

From: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>
To: Péter Kovács <maxottovonstirlitz(at)gmail(dot)com>
Cc: Oliver Jowett <oliver(at)opencloud(dot)com>, Dave Cramer <pg(at)fastcrypt(dot)com>, Frédérik Bilhaut <frederik(dot)bilhaut(at)noopsis(dot)fr>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Very strange performance decrease when reusing a PreparedStatement
Date: 2009-05-04 09:42:25
Message-ID: 49FEB881.8080401@nitorcreations.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Péter Kovács wrote:
> On Mon, May 4, 2009 at 1:43 AM, Oliver Jowett <oliver(at)opencloud(dot)com> wrote:
>> Péter Kovács wrote:
>> Are you familiar with how the wire protocol works?
>>
>> If you have a query:
>>
>> SELECT * FROM foo WHERE someindexedvalue=?
>>
>> When the unnamed statement is used, a new query plan is constructed when
>> parameters are bound to the statement before execution, and the
>> *particular* parameter value provided is given to the query planner.
>>
>> The query planner then make decisions based on that actual value. For
>> example, if the table statistics say that the particular value used is
>> very selective for that column, it may decide to use an index scan over
>> a table scan.
>>
>> When a named statement is used, the server behavior changes. The query
>> is planned once, when the statement is created but before parameter
>> values are bound. The planner does not have particular parameter values
>> available, and it creates a general plan suitable for any parameter
>> value. This can result in different index selectivity information,
>> because the values that will be actually be used are not known ahead of
>> time .. which in turn can result in different plans. The resulting plan
>> may be "good" for any random value but "bad" for a particular value.
>> Often apprications are not passing in any random value - there are
>> external constraints on the value that are not visible in the structure
>> of the query. For example, maybe you only retrieve log data from that
>> 5GB table with a timestamp range that selects a few minutes of data -
>> but without specific parameter values, the planner doesn't know that.
>>
>> "server side prepared statement" is the JDBC driver jargon for "uses a
>> persistent, named, statement at the protocol level". The mechanism for
>> passing parameters remains the same in both cases - the difference is
>> that in one case a one-shot unnamed statement is used, in the other case
>> a persistent server-side named statement is used.
>>
>> The distinction is there precisely because there *ARE* known performance
>> issues on the server side with some queries when using a named statements.
>>
>> There is not much more the driver can do than give you tuning knobs for
>> the problematic cases. Anything else is going to have to happen on the
>> server side to make the planner smarter in these cases (and I'm not even
>> sure if it CAN be smarter with the information it has)
>
> Thank you, Oliver! This clarifies a lot!
>
> Do I understand it correctly that in the case of named statements, the
> query planner makes an assumption about the selectivity of the actual
> values to create the plan which will be cached and reused? If so,
> wouldn't it be possible for the planner to
> 1. also cache the selectivity estimations used for the cached plan;
> 2. check the actual selectivity of parameters before each execution;
> 3. create (and execute) a "temporary plan" for those "unexpected"
> parameters whose actual selectivity values differ significantly from
> the selectivity values assumed for the cached plan.
>
> Assuming that the cost of looking up actual selectivity is
> insignificant compared to the cost of (re)creating a cached query
> plan, this could avoid the kind of unexpected performance death this
> thread is about. (And let's face it: being smart about optimizing
> query execution is a much easier task for the query planner than for
> the application developer. The query planner doesn't know the
> parameters beforehand, but is at least in possession of detailed
> low-level statistics. The application developer (typically) doesn't
> know either the parameters beforehand (they are specified by the user)
> or the table/index statistics.)

Another option, if the server side cannot be changed, is to allow multiple
server named statements for each jdbc prepared statement. Of course the
heuristics when to do such thing is nothing easy to write, and if it needs
manual configuration most applications would not take advantage of the feature.

An easier solution would be to make the client side detect the problematic
situation. The jdbc driver would be to fetch the explain analyze from both the
exact parametrized query and for the named statement with the parameters. If the
plans have totally different execution plan then it means the generic plan is
not good enough. At this point the jdbc driver could decide not to use named
statements at all for the particular prepared statement.

I think the latter solution would fix the main problems that users are seeing
with just on-time extra work done on the n-th prepared statement execution.

-Mikko

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message John Lister 2009-05-04 12:16:43 Re: Unit test patches
Previous Message Thomas Kellerer 2009-05-04 09:03:17 Re: Very strange performance decrease when reusing a PreparedStatement