Skip site navigation (1) Skip section navigation (2)

Re: Avoiding bad prepared-statement plans.

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 14:49:35
Message-ID: 4B87DF7F.4000100@mark.mielke.cc (view raw or flat)
Thread:
Lists: pgsql-hackers
On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:
> Mark Mielke wrote:
>
>> All the points about ms seem invalid to me. There are many reason why 
>> ms could increase, and many of them have nothing to do with plan 
>> efficiency. Again, re-planning due to a high ms, or a high ratio of 
>> ms, does not indicate that re-planning will improve the success of 
>> the plan. The planning process does not measure ms or predict ms.
>
> That's true, but missing some very basic points about the idea: one, 
> if we can tell that a query is going to be expensive, then the cost of 
> re-planning it is marginal.  Two, if we can tell that a query is going 
> to be expensive, then we stand a lot to gain if re-planning turns out 
> to be useful.  It follows that we can afford to re-plan on the 
> off-chance, without anything more than a vague orders-of-magnitude 
> idea of what "expensive" means.
>
> What Tom said validates a big assumption I've been making: that we do 
> in fact have a decent shot at telling in advance that a query is going 
> to be expensive.  Which means we have a decent shot at stopping your 
> 100ms query from taking seconds just because you prepared it and are 
> missing out on that tiny partial index.  That would be worth the extra 
> planning time at a 1% hit rate, and there's not much downside if we 
> don't reach that.

You trimmed most of my concerns. :-) Problems:

     1) If I do a PREPARE/EXECUTE, the above lengthens the process from 
1 generic planning plus 1 generic plan execute to 1 generic planning, 1 
specific planning, and 1 specific plan execution. This is still overall 
longer than a regular statement and it still may be longer than the 
original generic plan on its own. The hope is that the analysis is 
somehow detecting the scenario where a generic plan makes no sense, but 
the criteria is not about whether the generic plan actually does make 
sense - the criteria is "can the customer afford to wait longer for us 
to second guess ourselves?" It's a guess. As a guess, it means sometimes 
it will be right, and sometimes it will be wrong.

     2) Only the "order of magnitude" (by estimate) plans will benefit. 
If you set the number to 100X, then most plans won't benefit. If you set 
it to less than 100X, you increase the chance of guessing wrong in other 
cases. In any case, there is still no guarantee that a specific plan 
will be faster, so even in the 100X case, the overall results could be 
slower - it's just that you've decided the customer can afford to wait 
longer.

>> My idea of an optimal system is as follows:
>>
>> 1) Prepare gathers and caches data about the tables involved in the 
>> query, including column statistics that are likely to be required 
>> during the planning process, but prepare does not running the 
>> planning process.
>
> It sounds to me like you're in the process of inventing another 
> planning process.  Developer time aside, how much CPU time can you 
> afford to throw at this?

I already said I don't think PostgreSQL could easily evolve here. 
However, I wanted to point out that the problem may be architectural.

As for developer time and CPU time, that's not really relevant. If 
PREPARE/EXECUTE could be reliably sped up, than the savings is probably 
measure in millions of dollars or more, as it is widely used by many 
applications throughout the day on hundreds of thousands of computers.

Oh, you mean is it worth scratching my itch? :-) Not really. I was 
thinking about it yesterday and decided that such a major change might 
just as easily result in a new database engine, and I didn't want to go 
there.

Still, if some clever person agrees with me that it is an architecture 
problem, and that PostgreSQL could benefit from a clean "from scratch" 
caching mechanism for statements (note that what I described could 
probably be extended to support automatic prepare of every statement, 
and matching of query to prepared statement based on text, similar to 
MySQL query caching), and can come up with a way to do this using the 
existing architecture - that would be great. Or, they can tell me "too 
hard" as you are. That's fine too... :-)

> I don't see any reason to argue over what would be optimal when so 
> much information is still missing.  It just makes the problem look 
> harder than it is.  To me, our best shot at getting something useful 
> is to stay simple and defensive.  After that, if there is still a 
> need, we'll have code to help us gather more data and figure out how 
> to make it better. Nothing wrong with the lowest-hanging fruit.

What information is missing?

PREPARE sucks in many known situations. It is a documented fact. :-)

Will "guessing" at when the user can afford to wait longer improve the 
situation? Maybe or often, but not always.

Cheers,
mark


In response to

Responses

pgsql-hackers by date

Next:From: Richard HuxtonDate: 2010-02-26 14:57:16
Subject: Re: Hot Standby query cancellation and Streaming Replication integration
Previous:From: Heikki LinnakangasDate: 2010-02-26 14:45:07
Subject: Re: Hot Standby query cancellation and Streaming Replication integration

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group