Re: Avoiding bad prepared-statement plans.

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 14:49:35
Message-ID: 4B87DF7F.4000100@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:
> Mark Mielke wrote:
>
>> All the points about ms seem invalid to me. There are many reason why
>> ms could increase, and many of them have nothing to do with plan
>> efficiency. Again, re-planning due to a high ms, or a high ratio of
>> ms, does not indicate that re-planning will improve the success of
>> the plan. The planning process does not measure ms or predict ms.
>
> That's true, but missing some very basic points about the idea: one,
> if we can tell that a query is going to be expensive, then the cost of
> re-planning it is marginal. Two, if we can tell that a query is going
> to be expensive, then we stand a lot to gain if re-planning turns out
> to be useful. It follows that we can afford to re-plan on the
> off-chance, without anything more than a vague orders-of-magnitude
> idea of what "expensive" means.
>
> What Tom said validates a big assumption I've been making: that we do
> in fact have a decent shot at telling in advance that a query is going
> to be expensive. Which means we have a decent shot at stopping your
> 100ms query from taking seconds just because you prepared it and are
> missing out on that tiny partial index. That would be worth the extra
> planning time at a 1% hit rate, and there's not much downside if we
> don't reach that.

You trimmed most of my concerns. :-) Problems:

1) If I do a PREPARE/EXECUTE, the above lengthens the process from
1 generic planning plus 1 generic plan execute to 1 generic planning, 1
specific planning, and 1 specific plan execution. This is still overall
longer than a regular statement and it still may be longer than the
original generic plan on its own. The hope is that the analysis is
somehow detecting the scenario where a generic plan makes no sense, but
the criteria is not about whether the generic plan actually does make
sense - the criteria is "can the customer afford to wait longer for us
to second guess ourselves?" It's a guess. As a guess, it means sometimes
it will be right, and sometimes it will be wrong.

2) Only the "order of magnitude" (by estimate) plans will benefit.
If you set the number to 100X, then most plans won't benefit. If you set
it to less than 100X, you increase the chance of guessing wrong in other
cases. In any case, there is still no guarantee that a specific plan
will be faster, so even in the 100X case, the overall results could be
slower - it's just that you've decided the customer can afford to wait
longer.

>> My idea of an optimal system is as follows:
>>
>> 1) Prepare gathers and caches data about the tables involved in the
>> query, including column statistics that are likely to be required
>> during the planning process, but prepare does not running the
>> planning process.
>
> It sounds to me like you're in the process of inventing another
> planning process. Developer time aside, how much CPU time can you
> afford to throw at this?

I already said I don't think PostgreSQL could easily evolve here.
However, I wanted to point out that the problem may be architectural.

As for developer time and CPU time, that's not really relevant. If
PREPARE/EXECUTE could be reliably sped up, than the savings is probably
measure in millions of dollars or more, as it is widely used by many
applications throughout the day on hundreds of thousands of computers.

Oh, you mean is it worth scratching my itch? :-) Not really. I was
thinking about it yesterday and decided that such a major change might
just as easily result in a new database engine, and I didn't want to go
there.

Still, if some clever person agrees with me that it is an architecture
problem, and that PostgreSQL could benefit from a clean "from scratch"
caching mechanism for statements (note that what I described could
probably be extended to support automatic prepare of every statement,
and matching of query to prepared statement based on text, similar to
MySQL query caching), and can come up with a way to do this using the
existing architecture - that would be great. Or, they can tell me "too
hard" as you are. That's fine too... :-)

> I don't see any reason to argue over what would be optimal when so
> much information is still missing. It just makes the problem look
> harder than it is. To me, our best shot at getting something useful
> is to stay simple and defensive. After that, if there is still a
> need, we'll have code to help us gather more data and figure out how
> to make it better. Nothing wrong with the lowest-hanging fruit.

What information is missing?

PREPARE sucks in many known situations. It is a documented fact. :-)

Will "guessing" at when the user can afford to wait longer improve the
situation? Maybe or often, but not always.

Cheers,
mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2010-02-26 14:57:16 Re: Hot Standby query cancellation and Streaming Replication integration
Previous Message Heikki Linnakangas 2010-02-26 14:45:07 Re: Hot Standby query cancellation and Streaming Replication integration