Re: BUG #6668: hashjoin cost problem

From: Postgres User <postgresuser(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6668: hashjoin cost problem
Date: 2012-05-31 05:57:17
Message-ID: 1338443837.61025.YahooMailNeo@web121105.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: postgresuser(at)yahoo(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Sent: Wednesday, May 30, 2012 10:03 PM
Subject: Re: [BUGS] BUG #6668: hashjoin cost problem

postgresuser(at)yahoo(dot)com writes:
> create table small(i) as select (g/1000) * 1000 from
> generate_series(1,10000) g;
> create table large(i) as select generate_series(1,100000000);

> It doesn't matter how big the big table is... for this distribution large
> table is hashed.

I don't think that's wrong.  If it hashes the small table, there cannot
be less than 1000 entries on each populated hash chain; adding more
work_mem doesn't help.  The planner is designed to avoid hashing such
unfriendly distributions as that.  The fact that you can get a somewhat
smaller runtime by forcing hashing in the other direction suggests that
its cost factors are not quite right for your specific case --- but it's
a long way from that observation to deciding that we should change the
cost factors for everyone.  In any case, the sizes of the tables are not
the only determinant of which one should be hashed.

            regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrzej Krawiec 2012-05-31 07:29:08 Re: BUG #6650: CPU system time utilization rising few times a day
Previous Message jose.soares 2012-05-31 05:25:24 BUG #6669: unique index w/ multiple columns and NULLs