Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: lockhart(at)alumni(dot)caltech(dot)edu, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)
Date: 1999-02-06 17:51:52
Message-ID: 199902061751.MAA03085@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> >> Why are we maintaining this huge Path List for every RelOptInfo
> >> structure if we only need the cheapest?
>
> I think Bruce is onto something here... keep only the best-so-far
> not everything ever generated. That'd get rid of the O(N^2)
> comparison behavior.
>
> The only thing I can think of that we might possibly *need* the
> whole list for is if it is used as a recursion stopper.
> ("Oh, I already investigated that alternative, no need to go down
> that path again.") It did not look to me like the optimizer does
> such a thing, but I don't claim to understand the code.
>
> It seems to me that the search space of possible paths is
> well-structured and can be enumerated without generation of duplicates.
> You've got a known set of tables involved in a query, a fixed set of
> possible access methods for each table, and only so many ways to
> combine them. So the real question here is why does the optimizer
> even need to check for duplicates --- should it not never generate
> any to begin with? The profiles I ran a few days ago show that indeed
> most of the generated paths are not duplicates, but a small fraction
> are duplicates. Why is that? I'd feel a lot closer to understanding
> what's going on if we could explain where the duplicates come from.

Here is my guess, and I think it is valid. There are cases where a join
of two tables may be more expensive than another plan, but the order of
the result may be cheaper to use for a later join than the cheaper plan.
That is why I suspect they are doing in better_path():

if (samekeys(path->keys, new_path->keys) &&
equal_path_ordering(&path->p_ordering,
&new_path->p_ordering))
{
old_path = path;
break;
}

So I don't think we can grab the cheapest path right away, but I think
the system is keeping around non-cheapest paths with the same result
ordering. I also would like to throw away plans that are so expensive
that sorting another plan to the desired ordering would be cheaper than
using the plan.

I am only guessing on this. How often does this 'keep non-cheapest path
around' really help?

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-02-06 20:41:49 Re: [HACKERS] Re: SELECT INTO TABLE busted?
Previous Message Jan Wieck 1999-02-06 17:28:16 Re: [HACKERS] strange behaviour on pooled alloc