Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-04-17 01:43:11
Message-ID: 603c8f070904161843g457c54bqe252adbf899e69c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> If HashAggregate is faster, then the question is can you make it better
> by avoiding building the hash structure twice.  I haven't considered all
> the possibilities, but the situation you have used as an example, an IN
> query, seems workable.  Instead of translating to a hash
> aggregate/hash/hash join query plan, it may be possible to create a
> special hash join node that does uniquefy.

Yeah, that's what I was looking at. The problem is that unique-ify is
not free either - we have to invoke the appropriate comparison
operators for every tuple in the bucket for which the hash values
match exactly. So, for example if the input has K copies each of N
items, I'll need to do (K - 1) * N comparisons, assuming no hash
collisions. In return, the number of tuples in each bucket will be
reduced by a factor of K, but that doesn't actually save very much,
because I can reject all of those with an integer comparison anyway,
again assuming no hash collisions, so it's pretty cheap.

If the hash join was on track to go multi-batch, then unique-ifying it
on the fly makes a lot of sense... otherwise, I'm not sure it's
really going to be a win. Anyhow, further analysis needed...

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Tolley 2009-04-17 02:49:34 Lifetime of FmgrInfo
Previous Message Kris Jurka 2009-04-17 01:02:04 Re: No hash join across partitioned tables?