Quick Links

Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From:	"Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Date:	2005-11-28 19:14:49
Message-ID:	FA095C015271B64E99B197937712FD026679DB@freedom.grz.icomedias.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

If the query runs slow it will be not such a problem, but I was very concerned about other queries having this problem too - without knowing it. I've already rewritten the query to use IN instead of exists.

I'll compile again and try it again.

Thanks alot!

Best regards,
Mario Weilguni

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Gesendet: Mo 28.11.2005 19:39
An: Mario Weilguni
Cc: pgsql-hackers(at)postgresql(dot)org
Betreff: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

"Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com> writes:
> Thanks for the quick response, I've tried the patch, but it did not work
> as expected. When I set enable_hashjoin to off, everything works as
> expected, but with hashjoin on I do not even get results anymore, CPU is
> going up to 100% and after 3 minutes I cancelled the query (it normale
> would take ~100-500 milliseconds).

Try letting it run longer. I think your expectation is tuned for the
broken implementation (which runs the subqueries only once instead of
26k times...)

The test case I developed for this failure in the regression database is

select count(*) from tenk1 a
where exists (select 1 from tenk1 b, tenk1 c
where b.unique1=c.unique2 and
b.hundred in (4,5) and c.hundred=a.hundred+99);

8.0 prefers a nestloop for the subquery, and that plan runs in about
600 ms on my machine. If forced to a hash join, it takes about 2450 ms.
8.1 prefers the hash join to start with, but takes 11300 ms to run it :-(
(after the patch that is).

The reason for the differential is that 8.1 guesses wrong about which
subplan to cycle first: most of the time, the inner plan is empty and
so there's no need to pull any rows from the outer plan, but 8.1 pulls
the first row from the outer plan anyway, and doing that 10000 times is
what's eating the extra runtime. It looks from your previous message
that similar things are happening with your data distribution, allowing
8.0 to run faster for you than 8.1 does.

Not sure if there's much we can do about this. The presence of the
upper-query parameter in the subplan makes it difficult to derive any
stats at all, let alone guess how often the subplan will be completely
empty, so I'm not sure the planner can help.

For a query like this, where the hash join is being done repeatedly,
it might be useful for the executor itself to track how often each
subplan has been seen to be empty. In particular, the executor knows
that the outer subplan is parameterless and therefore should deliver
the same results each time (modulo volatile functions of course), so
after the first cycle it could know that there's no point in trying
the early fetch on that side. Dunno if this will be of wide enough
use to be worth implementing though --- in simple cases the join
won't be rescanned and so the executor can't help.

Anyone have any other ideas?

regards, tom lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Marc G. Fournier	2005-11-28 19:48:54	Re: Anonymous CVS working?
Previous Message	Marc G. Fournier	2005-11-28 19:06:47	Re: Anonymous CVS working?