Skip site navigation (1) Skip section navigation (2)

Re: Planner doesn't look at LIMIT?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: pgsql-performance(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Planner doesn't look at LIMIT?
Date: 2005-07-22 16:20:20
Message-ID: 6440.1122049220@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
I wrote:
> Dawid Kuroczko <qnex42(at)gmail(dot)com> writes:
>> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;

>> Limit  (cost=15912.20..15912.31 rows=1 width=272)
>> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)

>> If I set enable_hashjoin=false:

>> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;

>> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
>> rows=1 loops=1)
>> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
>> width=272) (actual time=74.204..74.204 rows=1 loops=1)

> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.

After a suitable period of contemplating my navel, I figured out
what is going on here: the total costs involved are large enough that
the still-fairly-high startup cost of the hash is disregarded by
compare_fuzzy_path_costs(), and so the nestloop is discarded as not
having any significant potential advantage in startup time.

I think that this refutes the original scheme of using the same fuzz
factor for both startup and total cost comparisons, and therefore
propose the attached patch.

Comments?

			regards, tom lane

*** src/backend/optimizer/util/pathnode.c.orig	Fri Jul 15 13:09:25 2005
--- src/backend/optimizer/util/pathnode.c	Fri Jul 22 12:08:25 2005
***************
*** 98,157 ****
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
- 	Cost		fuzz;
- 
  	/*
! 	 * The fuzz factor is set at one percent of the smaller total_cost,
! 	 * but not less than 0.01 cost units (just in case total cost is
! 	 * zero).
  	 *
  	 * XXX does this percentage need to be user-configurable?
  	 */
- 	fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
- 	fuzz = Max(fuzz, 0.01);
- 
  	if (criterion == STARTUP_COST)
  	{
! 		if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
! 		{
! 			if (path1->startup_cost < path2->startup_cost)
! 				return -1;
! 			else
! 				return +1;
! 		}
  
  		/*
  		 * If paths have the same startup cost (not at all unlikely),
  		 * order them by total cost.
  		 */
! 		if (Abs(path1->total_cost - path2->total_cost) > fuzz)
! 		{
! 			if (path1->total_cost < path2->total_cost)
! 				return -1;
! 			else
! 				return +1;
! 		}
  	}
  	else
  	{
! 		if (Abs(path1->total_cost - path2->total_cost) > fuzz)
! 		{
! 			if (path1->total_cost < path2->total_cost)
! 				return -1;
! 			else
! 				return +1;
! 		}
  
  		/*
  		 * If paths have the same total cost, order them by startup cost.
  		 */
! 		if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
! 		{
! 			if (path1->startup_cost < path2->startup_cost)
! 				return -1;
! 			else
! 				return +1;
! 		}
  	}
  	return 0;
  }
--- 98,138 ----
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
  	/*
! 	 * We use a fuzz factor of 1% of the smaller cost.
  	 *
  	 * XXX does this percentage need to be user-configurable?
  	 */
  	if (criterion == STARTUP_COST)
  	{
! 		if (path1->startup_cost > path2->startup_cost * 1.01)
! 			return +1;
! 		if (path2->startup_cost > path1->startup_cost * 1.01)
! 			return -1;
  
  		/*
  		 * If paths have the same startup cost (not at all unlikely),
  		 * order them by total cost.
  		 */
! 		if (path1->total_cost > path2->total_cost * 1.01)
! 			return +1;
! 		if (path2->total_cost > path1->total_cost * 1.01)
! 			return -1;
  	}
  	else
  	{
! 		if (path1->total_cost > path2->total_cost * 1.01)
! 			return +1;
! 		if (path2->total_cost > path1->total_cost * 1.01)
! 			return -1;
  
  		/*
  		 * If paths have the same total cost, order them by startup cost.
  		 */
! 		if (path1->startup_cost > path2->startup_cost * 1.01)
! 			return +1;
! 		if (path2->startup_cost > path1->startup_cost * 1.01)
! 			return -1;
  	}
  	return 0;
  }

In response to

Responses

pgsql-performance by date

Next:From: Patrick WelcheDate: 2005-07-22 16:49:05
Subject: Re: COPY FROM performance improvements
Previous:From: Dawid KuroczkoDate: 2005-07-22 16:09:37
Subject: Re: Planner doesn't look at LIMIT?

pgsql-hackers by date

Next:From: Rocco AltierDate: 2005-07-22 16:21:08
Subject: Re: [HACKERS] regressin failure on latest CVS
Previous:From: Dawid KuroczkoDate: 2005-07-22 16:09:37
Subject: Re: Planner doesn't look at LIMIT?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group