From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) |
Cc: | lockhart(at)alumni(dot)caltech(dot)edu, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) |
Date: | 1999-02-06 17:51:52 |
Message-ID: | 199902061751.MAA03085@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> >> Why are we maintaining this huge Path List for every RelOptInfo
> >> structure if we only need the cheapest?
>
> I think Bruce is onto something here... keep only the best-so-far
> not everything ever generated. That'd get rid of the O(N^2)
> comparison behavior.
>
> The only thing I can think of that we might possibly *need* the
> whole list for is if it is used as a recursion stopper.
> ("Oh, I already investigated that alternative, no need to go down
> that path again.") It did not look to me like the optimizer does
> such a thing, but I don't claim to understand the code.
>
> It seems to me that the search space of possible paths is
> well-structured and can be enumerated without generation of duplicates.
> You've got a known set of tables involved in a query, a fixed set of
> possible access methods for each table, and only so many ways to
> combine them. So the real question here is why does the optimizer
> even need to check for duplicates --- should it not never generate
> any to begin with? The profiles I ran a few days ago show that indeed
> most of the generated paths are not duplicates, but a small fraction
> are duplicates. Why is that? I'd feel a lot closer to understanding
> what's going on if we could explain where the duplicates come from.
Here is my guess, and I think it is valid. There are cases where a join
of two tables may be more expensive than another plan, but the order of
the result may be cheaper to use for a later join than the cheaper plan.
That is why I suspect they are doing in better_path():
if (samekeys(path->keys, new_path->keys) &&
equal_path_ordering(&path->p_ordering,
&new_path->p_ordering))
{
old_path = path;
break;
}
So I don't think we can grab the cheapest path right away, but I think
the system is keeping around non-cheapest paths with the same result
ordering. I also would like to throw away plans that are so expensive
that sorting another plan to the desired ordering would be cheaper than
using the plan.
I am only guessing on this. How often does this 'keep non-cheapest path
around' really help?
--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 1999-02-06 20:41:49 | Re: [HACKERS] Re: SELECT INTO TABLE busted? |
Previous Message | Jan Wieck | 1999-02-06 17:28:16 | Re: [HACKERS] strange behaviour on pooled alloc |