Re: Fuzzy cost comparison to eliminate redundant planning

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fuzzy cost comparison to eliminate redundant planning
Date: 2004-03-29 06:01:23
Message-ID: 200403290601.i2T61NY13559@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> The reason these are all being kept is that add_path() will keep paths
> of the same ordering (same pathkeys) if they rank differently on startup
> cost than they do on total cost. Now that's a good rule in the
> abstract, but a human looking at these numbers would certainly say that
> it's being taken to extremes. There isn't enough difference in these
> startup costs to justify treating them as different ... especially given
> the inherent inaccuracy of the estimates.
>
> As an experiment, I made the attached patch that converts add_path's
> cost comparisons to "fuzzy" comparisons --- two paths are considered to
> have the same cost if it differs by less than 1% of the smaller
> total_cost. I found that this reduced the planning time of Eric's
> query by about 40%, without changing the resulting plan. On more
> typical queries where the relations all have different statistics,
> I doubt it would have as much effect, but I'm inclined to apply the
> change anyway.

I like the 1% idea. It is sort of like a cheap GEQO --- throwing out
plans that have very similar costs. We still test all plans (unlike
GEQO) but we trim more agressively than we do now.

Do we know in the optimizer whether we will be needing cheapest startup
or not? Could we eliminate cheapest startup plans when they aren't
needed?

In the pre-cheapest path code, we really only kept cheapest cost, plus
plans with pathkeys. Now that we have cheapest, do we need to rethink
what plans we keep?

Let's look at the plans you listed:

{NESTPATH
:startup_cost 0.01
:total_cost 1634.63
:pathkeys <>
}

{NESTPATH
:startup_cost 0.005
:total_cost 1642.65
:pathkeys <>
}

{NESTPATH
:startup_cost 0.00
:total_cost 1650.66
:pathkeys <>
}

They are listed in order of increasing total cost, and decreasing
startup cost. Why is the middle one kept?

You state:

> The reason these are all being kept is that add_path() will keep paths
> of the same ordering (same pathkeys) if they rank differently on startup
> cost than they do on total cost. Now that's a good rule in the

Is the middle one kept because the optimizer has to mix the startup plus
some percentage of the total cost for queries using LIMIT?

Should we require that kept plans have to have a greater difference in
startup cost compared to total cost. For example:

{NESTPATH
:startup_cost 0.01
:total_cost 1634.63
:pathkeys <>
}

{NESTPATH
:startup_cost 0.005
:total_cost 1642.65
:pathkeys <>
}

In this case, the total cost difference is +8, but the startup cost
difference is only -0.005.

Or looking at these plans:

{NESTPATH
:startup_cost 0.01
:total_cost 1634.63
:pathkeys <>
}

{NESTPATH
:startup_cost 0.005
:total_cost 1642.65
:pathkeys <>
}

{NESTPATH
:startup_cost 0.00
:total_cost 1650.66
:pathkeys <>
}

If you are selecting the whole table, you pick the first first plan, and
if you are selecting only one row, you pick the last plan. Looking at
some computations:

> 1634.63 * 0.0001 + 0.01
.173463
> 1642.65 * 0.0001 + 0.005
.169265
> 1650.66 * 0.0001
.165066

Pulling 0.01% of the table has us prefering the last plan, while

> 1634.63 * 0.001 + 0.01
1.64463
> 1642.65 * 0.001 + 0.005
1.64765
> 1650.66 * 0.001
1.65066

has us pulling from the first plan, so the useful range of the second
query is between 0.01% and 0.001% of the table. That's a pretty tiny
area of usefulness for the second plan.

However, I can't think of how to measure the range of percentage of
usefulness a plan.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Eric Ridge 2004-03-29 06:01:56 Re: User Defined Functions/AM's inherently slow?
Previous Message Joe Conway 2004-03-29 05:49:16 Re: Fuzzy cost comparison to eliminate redundant planning