Re: Avoiding bad prepared-statement plans.

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:26:05
Message-ID: 4B882E5D.9080209@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/26/2010 02:57 PM, Tom Lane wrote:
> Mark Mielke<mark(at)mark(dot)mielke(dot)cc> writes:
>
>> There must be some way to lift the cost of planning out of the plan
>> enumeration and selection phase, such that only plan enumeration and
>> selection is run at execute time. In most cases, plan enumeration and
>> selection, provided that all data required to make these decisions is
>> all cached in data structures ready to go, should be very fast? Right?
>>
> Huh? What exactly do you think the cost of planning is, if not
> enumeration and selection? There isn't very much that's cacheable,
> at least not in any sanely-sized cache.
>

I think most operations, including this one, can be broken into a fixed
portion and a dynamic portion. The PREPARE should concern itself only
with the fixed portion, and should leave the dynamic portion to EXECUTE.
At present, the "planning process" is one big blob.

Here are parts that can be done "fixed":

1) Statement parsing and error checking.
2) Identification of tables and columns involved in the query.
3) Query the column statistics for involved columns, to be used in plan
cost estimation now and later.
4) Determine plan constraints under which elements of the plan must be
executed a certain way (something like constant folding for a compiler),
or for which parameter substitution would not impact the outcome.
5) Identify the elements of the plan that still require plan enumeration
and plan selection, to be used in a later part of the pipeline.

At a minimum, I am suggesting that 1), 2), and 3) should take a chunk
out of the planning process. I think 4) and 5) are more complex but
still valuable in terms of extracting the fixed portion out of the
planning process.

I think an assumption is being made that the planning process is an
atomic unit that cannot be turned into a pipeline or assembly line. I
think this assumption was what originally tied PREPARE = PLAN, and
EXECUTE = RUN. I think this assumption is leading to the conclusion that
EXECUTE should re-plan. I also expect that this assumption is tightly
woven into the current implementation and changing it would require some
amount of re-architecture. :-)

>> By "not worth it", do you mean development effort or run time?
>>
> Run time. The development cost of what you are proposing is negligible:
> just rip out the plan cache altogether. I don't believe it would be a
> performance win though.
>

That's not my proposal, though. I'm suspecting you didn't read it. :-)

I'm fine with you saying "too hard and not worth my development effort"
after you read it. I agree it would be a lot of work.

But if the conclusion is that the current architecture is the best that
can be had, and the decision is only about when to do a custom re-plan
or when to use the generic plan, I am putting my opinion out there that
the generic plan has always been a compromise, and it will always be a
compromise, and that this discussion exists primarily because the
compromise is not adequate in many real world scenarios.

And that all said, I think I am challenging the status quo and ticking
people off. So while my intent is to challenge the status quo, it is not
to tick people off. So, please let me know if you would like me to
continue, or if you have already written this off. :-)

Cheers,
mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yeb Havinga 2010-02-26 20:28:01 C libpq frontend library fetchsize
Previous Message Aidan Van Dyk 2010-02-26 20:25:33 Re: Re: Hot Standby query cancellation and Streaming Replication integration