Skip site navigation (1) Skip section navigation (2)

Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Date: 2005-11-28 23:49:40
Message-ID: 17170.1133221780@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
I wrote:
> For a query like this, where the hash join is being done repeatedly,
> it might be useful for the executor itself to track how often each
> subplan has been seen to be empty.

I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using.  Give it a try ...

			regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c	28 Nov 2005 17:14:47 -0000	1.75.2.2
--- src/backend/executor/nodeHashjoin.c	28 Nov 2005 23:41:28 -0000
***************
*** 120,135 ****
  		 * since we aren't going to be able to skip the join on the strength
  		 * of an empty inner relation anyway.)
  		 *
  		 * The only way to make the check is to try to fetch a tuple from the
  		 * outer plan node.  If we succeed, we have to stash it away for later
  		 * consumption by ExecHashJoinOuterGetTuple.
  		 */
! 		if (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost ||
! 			node->js.jointype == JOIN_LEFT)
  		{
  			node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
  			if (TupIsNull(node->hj_FirstOuterTupleSlot))
  				return NULL;
  		}
  		else
  			node->hj_FirstOuterTupleSlot = NULL;
--- 120,147 ----
  		 * since we aren't going to be able to skip the join on the strength
  		 * of an empty inner relation anyway.)
  		 *
+ 		 * If we are rescanning the join, we make use of information gained
+ 		 * on the previous scan: don't bother to try the prefetch if the
+ 		 * previous scan found the outer relation nonempty.  This is not
+ 		 * 100% reliable since with new parameters the outer relation might
+ 		 * yield different results, but it's a good heuristic.
+ 		 *
  		 * The only way to make the check is to try to fetch a tuple from the
  		 * outer plan node.  If we succeed, we have to stash it away for later
  		 * consumption by ExecHashJoinOuterGetTuple.
  		 */
! 		if (node->js.jointype == JOIN_LEFT ||
! 			(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
! 			 !node->hj_OuterNotEmpty))
  		{
  			node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
  			if (TupIsNull(node->hj_FirstOuterTupleSlot))
+ 			{
+ 				node->hj_OuterNotEmpty = false;
  				return NULL;
+ 			}
+ 			else
+ 				node->hj_OuterNotEmpty = true;
  		}
  		else
  			node->hj_FirstOuterTupleSlot = NULL;
***************
*** 159,164 ****
--- 171,183 ----
  		 * scanning the outer relation
  		 */
  		hashtable->nbatch_outstart = hashtable->nbatch;
+ 
+ 		/*
+ 		 * Reset OuterNotEmpty for scan.  (It's OK if we fetched a tuple
+ 		 * above, because ExecHashJoinOuterGetTuple will immediately
+ 		 * set it again.)
+ 		 */
+ 		node->hj_OuterNotEmpty = false;
  	}
  
  	/*
***************
*** 454,459 ****
--- 473,479 ----
  	hjstate->js.ps.ps_TupFromTlist = false;
  	hjstate->hj_NeedNewOuter = true;
  	hjstate->hj_MatchedOuter = false;
+ 	hjstate->hj_OuterNotEmpty = false;
  
  	return hjstate;
  }
***************
*** 546,551 ****
--- 566,574 ----
  			*hashvalue = ExecHashGetHashValue(hashtable, econtext,
  											  hjstate->hj_OuterHashKeys);
  
+ 			/* remember outer relation is not empty for possible rescan */
+ 			hjstate->hj_OuterNotEmpty = true;
+ 
  			return slot;
  		}
  
***************
*** 810,816 ****
  		if (node->hj_HashTable->nbatch == 1 &&
  			((PlanState *) node)->righttree->chgParam == NULL)
  		{
! 			/* okay to reuse the hash table; needn't rescan inner, either */
  		}
  		else
  		{
--- 833,851 ----
  		if (node->hj_HashTable->nbatch == 1 &&
  			((PlanState *) node)->righttree->chgParam == NULL)
  		{
! 			/*
! 			 * okay to reuse the hash table; needn't rescan inner, either.
! 			 *
! 			 * What we do need to do is reset our state about the emptiness
! 			 * of the outer relation, so that the new scan of the outer will
! 			 * update it correctly if it turns out to be empty this time.
! 			 * (There's no harm in clearing it now because ExecHashJoin won't
! 			 * need the info.  In the other cases, where the hash table
! 			 * doesn't exist or we are destroying it, we leave this state
! 			 * alone because ExecHashJoin will need it the first time
! 			 * through.)
! 			 */
! 			node->hj_OuterNotEmpty = false;
  		}
  		else
  		{
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.139.2.2
diff -c -r1.139.2.2 execnodes.h
*** src/include/nodes/execnodes.h	22 Nov 2005 18:23:28 -0000	1.139.2.2
--- src/include/nodes/execnodes.h	28 Nov 2005 23:41:28 -0000
***************
*** 1101,1106 ****
--- 1101,1107 ----
   *		hj_FirstOuterTupleSlot	first tuple retrieved from outer plan
   *		hj_NeedNewOuter			true if need new outer tuple on next call
   *		hj_MatchedOuter			true if found a join match for current outer
+  *		hj_OuterNotEmpty		true if outer relation known not empty
   * ----------------
   */
  
***************
*** 1125,1130 ****
--- 1126,1132 ----
  	TupleTableSlot *hj_FirstOuterTupleSlot;
  	bool		hj_NeedNewOuter;
  	bool		hj_MatchedOuter;
+ 	bool		hj_OuterNotEmpty;
  } HashJoinState;
  
  

In response to

pgsql-hackers by date

Next:From: Greg StarkDate: 2005-11-29 00:25:58
Subject: Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Previous:From: Bruce MomjianDate: 2005-11-28 23:19:27
Subject: Re: NVL vs COALESCE

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group