Re: An Idea for planner hints

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: An Idea for planner hints
Date: 2006-08-22 18:56:17
Message-ID: 44EB5351.5000704@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Jim C. Nasby wrote:
>>> Meet EXPLAIN ANALYZE.
>> Which does no good for apps that you don't control the code on. Even
>> if you do control the code, you have to find a way to stick EXPLAIN
>> ANALYZE in front of every query, and figure out how to deal with
>> what's comming back.
>
> It would not be hard to create an "auto explain analyze" mode that
> implicitly runs EXPLAIN ANALYZE along with every query and logs the
> result. On its face, it sounds like an obviously great idea. I just
> don't see how you would put that to actual use, unless you want to read
> server logs all day long. Grepping for query duration and using the
> statistics views are much more manageable tuning methods. In my view
> anyway.
>
>> Going back to the original discussion though, there's no reason this
>> needs to involve EXPLAIN ANALYZE. All we want to know is what columns
>> the planner is dealing with as a set rather than individually.
>
> This would log a whole bunch of column groups, since every moderately
> interesting query uses a column in combination with some other column,
> but you still won't know which ones you want the planner to optimize.
>
> To get that piece of information, you'd need to do something like
> principal component analysis over the column groups thus identified.
> Which might be a fun thing to do. But for the moment I think it's
> better to stick to declaring the interesting pairs/groups manually.
>

If the system logs which cross-table join statistics it didn't have for
cross-table joins that it actually performed, it won't log the really
interesting stuff.

What is interesting are the plans that it didn't chose on account of guessing
that they were too expensive, when in reality the cross-table statistics were
such that they were not too expensive. This case might not be the common case,
but it is the interesting case. We are trying to get the planner to notice
cheap plans that don't look cheap unless you have the cross-table statistics.
So you have a chicken-and-egg problem here unless the system attempts (or
outputs without actually attempting) what appear to be sub-optimal plans in
order to determine how bad they really are.

I proposed something like this quite a bit up-thread. I was hoping we could
have a mode in which the system would run the second, third, fourth, ... best
plans rather than just the best looking one, and then determine from actual
runtime statistics which was best. (The proposal also included the ability to
output the best plan and read that in at a later time in lieu of a SQL query,
but that part of it can be ignored if you like.) The posting didn't generate
much response, so I'm not sure what people thought of it. The only major
problem I see is getting the planner to keep track of alternate plans. I don't
know the internals of it very well, but I think the genetic query optimizer
doesn't have a concept of "runner-up #1", "runner-up #2", etc., which it would
need to have.

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2006-08-22 18:57:54 Re: BUG #2585: Please provide pkg-config support
Previous Message Greg Stark 2006-08-22 18:46:41 Re: Tricky bugs in concurrent index build