Re: Horribly slow hash join

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgreSQL(dot)org
Subject: Re: Horribly slow hash join
Date: 2004-04-17 16:08:15
Message-ID: 10603.1082218095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

[ resending because I fat-fingered the cc: to the list ]

I see the problem: all the entries in your work_units column have the
low 32 bits equal to zero.

regression=# select distinct work_units % (2^32)::bigint from Trank_work_overall;
?column?
----------
0
(1 row)

The hash function for int8 only takes the low word into account, so all
of the entries end up on the same hash chain, resulting in worst-case
behavior. This applies to both your hash join and hash aggregate cases.

We could change the hash function, perhaps, but then we'd just have
different cases where there's a problem ... hashing will always fail on
*some* set of inputs. (Also, I have been harboring some notions of
supporting cross-type hash joins for integer types, which will not work
unless small int8 values hash the same as int4 etc.)

I guess the real issue is why are you encoding work_units like that?

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marcos Martínez (R) 2004-04-17 20:35:09 Re: Horribly slow hash join
Previous Message Jeff 2004-04-17 14:29:06 Re: [ SOLVED ] select count(*) very slow on an already