Re: a few crazy ideas about hash joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 20:29:25
Message-ID: 21826.1238790565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I don't see why hash_inner_and_outer can't walk the outer path looking
> for suitable hashes to reuse. I think the question is how aggressive
> we want to be in performing that search.

Correct, but you've got the details all wrong. The real problem is that
the planner might discard a join path hash(A,B) at level 2 because it
loses compared to, say, merge(A,B). But when we get to level three,
perhaps hash(hash(A,B),C) would've been the best plan due to synergy
of the two hashes. We'll never find that out unless we keep the
"inferior" hash path around. We can certainly do that; the question
is what's it going to cost us to allow more paths to survive to the
next join level. (And I'm afraid the answer may be "plenty"; it's a
combinatorial explosion we're looking at here.)

>> Incidentally a similar optimization is possible for materialize or
>> even sorts.

The planner already goes to great lengths to avoid extra sorts for
multiple levels of mergejoin ... that's what all the "pathkey" thrashing
is about. It's pretty expensive.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-04-03 20:40:41 Re: reloptions with a "namespace"
Previous Message Robert Haas 2009-04-03 20:15:10 Re: a few crazy ideas about hash joins