Re: An Idea for planner hints

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: An Idea for planner hints
Date: 2006-08-09 19:26:15
Message-ID: 44DA36D7.9020902@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim C. Nasby wrote:
> Been suggested before... the problem is actually doing something useful
> with all that data that's collected, as well as how to collect it
> without greatly impacting the system.

Identifying the best plan by means of actually running multiple plans and timing
them is useful. That would be the point.

As far as "without greatly impacting the system", I don't think that is a real
concern. The whole idea is to greatly impact the system *once*, sometime when
the DBA doesn't mind impacting the system (like before you go live on a
production network, or between midnight and 3 AM, or whatever), and then store
the best plan for future use.

The planner trades-off the desire to find the best plan and the need to find a
plan quickly. It also chooses a plan based on statistics and not based on
actual runtimes (because there is a chicken-and-egg problem: how do you know
which plan has the smallest runtime without running it?), so the chosen plan
that looks best based on statistics might not actually be best.

The idea I'm proposing circumvents the whole trade-off problem by explicitly
choosing to do something that makes the planner run really slowly and take a
really long time. But it doesn't do it "at runtime", in the sense that you
don't do it for each query. You just do it once up front and be done with it.
Of course, this is only useful for people with reasonably static queries and
reasonably static table statistics, so that a good plan found up-front continues
to be a good plan as it is repeatedly used.

My personal motivation is that I have tables whose statistics are quite static.
The data itself changes, but the statistical distribution from which the data
is pulled is unchanging, so the table statistics end up about the same even as
the data itself is added and deleted. On top of that, the planner keeps
choosing the wrong plan, which I know to be true because I can make individual
queries run faster by structuring them in ways that the planner can't see
through and "optimize" away the particular plan that I am effectively giving it.
But this is a PITA for me, especially since I don't always know what the best
plan might be and have to try them all until I find the right one. (With the
added complexity that I can't always figure out how to trick the planner into
choosing a specific plan, and hence can't test it.) It would be *so much
easier* to have an option to tell the planner to try them all.

mark

> On Tue, Aug 08, 2006 at 08:23:05AM -0700, Mark Dilger wrote:
>> If this feature I'm proposing already exists, sorry for the waste of
>> bandwidth, and could someone please point me to it? :)
>>
>> What if there were a mode that told postgres to do an exhaustive search (or
>> if not exhaustive, then much more extensive search) of all plans (or many
>> plans), trying each plan, reporting the performance of each, and discarding
>> the query results, much like "explain analyze" does. Postgres could then
>> dump the best plan in machine readable (and semi-human readable) form which
>> the planner could parse and use at some later date in lieu of a SQL query.
>>
>> This would allow people with reasonably static table statistics (where the
>> best plan is not likely to change) to spend upfront cycles investigating
>> the best plan and then embed that plan in their business logic. Since the
>> stored plan is both written-by and read-by postgres, it can get quite
>> complicated without putting a burden on humans to read and write such
>> complicated things. It would also remove the risk that the planner will
>> occasionally (due to its nondeterministic workings) choose a really bad
>> plan and stall a production system.
>>
>> mark

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-08-09 19:26:59 Re: An Idea for planner hints
Previous Message Jim C. Nasby 2006-08-09 19:14:18 Re: An Idea for planner hints