Skip site navigation (1) Skip section navigation (2)

Re: Getting different number of results when using hashjoin on/off

From: "Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting different number of results when using hashjoin on/off
Date: 2005-11-28 17:41:09
Message-ID: FA095C015271B64E99B197937712FD020E4B05A4@freedom.grz.icomedias.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hello Tom,

Thanks for the quick response, I've tried the patch, but it did not work
as expected. When I set enable_hashjoin to off, everything works as
expected, but with hashjoin on I do not even get results anymore, CPU is
going up to 100% and after 3 minutes I cancelled the query (it normale
would take ~100-500 milliseconds).

I will check the patch on a different machine again and inform you of
the results.

Best regards,
	Mario Weilguni


-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us] 
Sent: Monday, November 28, 2005 6:09 PM
To: Mario Weilguni
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Getting different number of results when using
hashjoin on/off 

"Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com> writes:
> No, I'm using 8.1.0, and tried it on different machines, always the
same results.

I see it, I think: the recent changes to avoid work when one or the
other side of the hash join is empty would exit the hash join leaving
a state that confused ExecReScanHashJoin() into thinking it didn't
have to do anything.  Try the attached patch.

			regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.1
diff -c -r1.75.2.1 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c	22 Nov 2005 18:23:09 -0000
1.75.2.1
--- src/backend/executor/nodeHashjoin.c	28 Nov 2005 17:04:43 -0000
***************
*** 152,163 ****
  		 * outer join, we can quit without scanning the outer
relation.
  		 */
  		if (hashtable->totalTuples == 0 && node->js.jointype !=
JOIN_LEFT)
- 		{
- 			ExecHashTableDestroy(hashtable);
- 			node->hj_HashTable = NULL;
- 			node->hj_FirstOuterTupleSlot = NULL;
  			return NULL;
- 		}
  
  		/*
  		 * need to remember whether nbatch has increased since
we began
--- 152,158 ----
***************
*** 487,493 ****
  	{
  		ExecHashTableDestroy(node->hj_HashTable);
  		node->hj_HashTable = NULL;
- 		node->hj_FirstOuterTupleSlot = NULL;
  	}
  
  	/*
--- 482,487 ----
***************
*** 805,841 ****
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
  	/*
- 	 * If we haven't yet built the hash table then we can just
return; nothing
- 	 * done yet, so nothing to undo.
- 	 */
- 	if (node->hj_HashTable == NULL)
- 		return;
- 
- 	/*
  	 * In a multi-batch join, we currently have to do rescans the
hard way,
  	 * primarily because batch temp files may have already been
released. But
  	 * if it's a single-batch join, and there is no parameter change
for the
  	 * inner subnode, then we can just re-use the existing hash
table without
  	 * rebuilding it.
  	 */
! 	if (node->hj_HashTable->nbatch == 1 &&
! 		((PlanState *) node)->righttree->chgParam == NULL)
! 	{
! 		/* okay to reuse the hash table; needn't rescan inner,
either */
! 	}
! 	else
  	{
! 		/* must destroy and rebuild hash table */
! 		ExecHashTableDestroy(node->hj_HashTable);
! 		node->hj_HashTable = NULL;
! 		node->hj_FirstOuterTupleSlot = NULL;
  
! 		/*
! 		 * if chgParam of subnode is not null then plan will be
re-scanned by
! 		 * first ExecProcNode.
! 		 */
! 		if (((PlanState *) node)->righttree->chgParam == NULL)
! 			ExecReScan(((PlanState *) node)->righttree,
exprCtxt);
  	}
  
  	/* Always reset intra-tuple state */
--- 799,830 ----
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
  	/*
  	 * In a multi-batch join, we currently have to do rescans the
hard way,
  	 * primarily because batch temp files may have already been
released. But
  	 * if it's a single-batch join, and there is no parameter change
for the
  	 * inner subnode, then we can just re-use the existing hash
table without
  	 * rebuilding it.
  	 */
! 	if (node->hj_HashTable != NULL)
  	{
! 		if (node->hj_HashTable->nbatch == 1 &&
! 			((PlanState *) node)->righttree->chgParam ==
NULL)
! 		{
! 			/* okay to reuse the hash table; needn't rescan
inner, either */
! 		}
! 		else
! 		{
! 			/* must destroy and rebuild hash table */
! 			ExecHashTableDestroy(node->hj_HashTable);
! 			node->hj_HashTable = NULL;
  
! 			/*
! 			 * if chgParam of subnode is not null then plan
will be re-scanned
! 			 * by first ExecProcNode.
! 			 */
! 			if (((PlanState *) node)->righttree->chgParam ==
NULL)
! 				ExecReScan(((PlanState *)
node)->righttree, exprCtxt);
! 		}
  	}
  
  	/* Always reset intra-tuple state */
***************
*** 847,852 ****
--- 836,842 ----
  	node->js.ps.ps_TupFromTlist = false;
  	node->hj_NeedNewOuter = true;
  	node->hj_MatchedOuter = false;
+ 	node->hj_FirstOuterTupleSlot = NULL;
  
  	/*
  	 * if chgParam of subnode is not null then plan will be
re-scanned by

Responses

pgsql-hackers by date

Next:From: James RobinsonDate: 2005-11-28 17:47:20
Subject: Re: Help: 8.0.3 Vacuum of an empty table never completes ...
Previous:From: Alvaro HerreraDate: 2005-11-28 17:27:52
Subject: Re: Anonymous CVS working?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group