Re: question on hash joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hartranft, Robert M(dot) (GSFC-423(dot)0)[RAYTHEON CO]" <robert(dot)m(dot)hartranft(at)nasa(dot)gov>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: question on hash joins
Date: 2017-10-18 21:17:06
Message-ID: 22675.1508361426@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert(dot)m(dot)hartranft(at)nasa(dot)gov> writes:
> explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=18200480.80..18200480.81 rows=1 width=8)
> -> Hash Join (cost=103206.82..18190602.43 rows=3951347 width=0)
> Hash Cond: (x.granuleid = g.granuleid)
> -> Seq Scan on table2 x (cost=0.00..10596253.01 rows=644241901 width=8)
> -> Hash (cost=92363.72..92363.72 rows=660888 width=8)
> -> Index Only Scan using idx_table1 on table1 g (cost=0.57..92363.72 rows=660888 width=8)
> Index Cond: (collectionid = '22467'::bigint)
> (7 rows)

> My question is, what gets put into the Hash?
> I assume the with "width=8" must refer to the size of the key.

It's the data payload width, but there's some per-row overhead too, which
is more than 8 bytes ... don't recall how much more at the moment.

> The reason I ask is because, when I try to run the query it fails due to
> temp file use over 10GB.

What have you got work_mem set to? If the hashed table exceeds that,
you're going to end up with a multi-batch join, in which case most of
the outer table is going to get written into temp files as well.

I don't think it's a good idea to have such a small temp_file_limit
when you're working with such huge tables. Either a merge or hash
join is likely to require temp file space similar to the table size,
and you do *not* want a nestloop join, at least not if you want your
answers in reasonable time.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO] 2017-10-19 13:09:42 Re: question on hash joins
Previous Message Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO] 2017-10-18 20:00:14 question on hash joins