Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Date: 2005-11-28 18:39:52
Message-ID: 13031.1133203192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com> writes:
> Thanks for the quick response, I've tried the patch, but it did not work
> as expected. When I set enable_hashjoin to off, everything works as
> expected, but with hashjoin on I do not even get results anymore, CPU is
> going up to 100% and after 3 minutes I cancelled the query (it normale
> would take ~100-500 milliseconds).

Try letting it run longer. I think your expectation is tuned for the
broken implementation (which runs the subqueries only once instead of
26k times...)

The test case I developed for this failure in the regression database is

select count(*) from tenk1 a
where exists (select 1 from tenk1 b, tenk1 c
where b.unique1=c.unique2 and
b.hundred in (4,5) and c.hundred=a.hundred+99);

8.0 prefers a nestloop for the subquery, and that plan runs in about
600 ms on my machine. If forced to a hash join, it takes about 2450 ms.
8.1 prefers the hash join to start with, but takes 11300 ms to run it :-(
(after the patch that is).

The reason for the differential is that 8.1 guesses wrong about which
subplan to cycle first: most of the time, the inner plan is empty and
so there's no need to pull any rows from the outer plan, but 8.1 pulls
the first row from the outer plan anyway, and doing that 10000 times is
what's eating the extra runtime. It looks from your previous message
that similar things are happening with your data distribution, allowing
8.0 to run faster for you than 8.1 does.

Not sure if there's much we can do about this. The presence of the
upper-query parameter in the subplan makes it difficult to derive any
stats at all, let alone guess how often the subplan will be completely
empty, so I'm not sure the planner can help.

For a query like this, where the hash join is being done repeatedly,
it might be useful for the executor itself to track how often each
subplan has been seen to be empty. In particular, the executor knows
that the outer subplan is parameterless and therefore should deliver
the same results each time (modulo volatile functions of course), so
after the first cycle it could know that there's no point in trying
the early fetch on that side. Dunno if this will be of wide enough
use to be worth implementing though --- in simple cases the join
won't be rescanned and so the executor can't help.

Anyone have any other ideas?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-28 18:46:53 Re: Help: 8.0.3 Vacuum of an empty table never completes ...
Previous Message James Robinson 2005-11-28 17:47:20 Re: Help: 8.0.3 Vacuum of an empty table never completes ...