Re: BUG #6668: hashjoin cost problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Postgres User <postgresuser(at)yahoo(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6668: hashjoin cost problem
Date: 2012-05-31 22:48:01
Message-ID: 23375.1338504481@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Postgres User <postgresuser(at)yahoo(dot)com> writes:
> Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

Well, if it hashes the smaller table, there's 100 million rows on the
outside, and each of them will probe one hash chain in the hash table.
If you're unlucky, each of those probes will hit a populated hash chain
with at least 1000 entries, leading to 100 billion comparisons. I think
it might derate that worst-case by a factor of 2. Now if you're lucky,
a lot of the outer tuples hit unpopulated hash chains and so the number
of comparisons is a lot less --- but in non-artificial examples,
that's not a very good bet to make. The conservative assumption is that
both sides of the join have similar key distributions, so that the more
populated hash chains are also more likely to be probed. The cost
estimate is therefore designed to discriminate against using an inner
relation with a non-flat distribution.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-05-31 22:53:17 Re: BUG #6671: Killed restore command causes postmaster to exit
Previous Message zaks.anna 2012-05-31 22:35:39 BUG #6672: Memory leaks in dumputils.c