Re: 9.3 Pre-proposal: Range Merge Join

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 Pre-proposal: Range Merge Join
Date: 2012-04-19 08:10:58
Message-ID: 1334823058.5487.84.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2012-04-18 at 01:21 -0400, Tom Lane wrote:
> It would be a pretty weird implementation of mergejoin that could
> "discard tuples from the middle" of an input stream. Or to be more
> specific, it wouldn't be the mergejoin itself that could do that at all
> --- you'd need the input plan node to be some sort of tweaked version of
> tuplestore or tuplesort that could respond to a request like that.

As I said in my reply to Robert, I think there are some ways we can make
this idea work.

> I can't escape the feeling that Jeff has chosen the wrong basis for his
> thought experiment, and that what he really ought to be thinking about
> is hashjoin, which keeps an in-memory table that it could easily modify
> on the fly if it chose to. The multi-batch aspect of hybrid hashjoin
> could be useful too (IOW, when you're out of better ideas, throw the
> tuple back in the water to process later).

Obviously hashing is not going to be much use for anything but equality.
So I believe this approach is very similar to the temporary-index
method, except with batching, and always keeping the index in memory.

I don't think we would get the partitioning benefit of hashjoin, because
we'd have to put the same tuple in multiple partitions, so it's probably
better to just leave the outer side intact.

But in-memory indexes and multiple passes of the outer seems like a
reasonable alternative, particularly because an in-memory index might be
very fast (to build as well as query).

> This is just handwaving of course. I think some digging in the
> spatial-join literature would likely find ideas better than any of
> these.

I will look in some more detail. The merge-like approach did seem to be
represented in the paper referenced by Alexander (the external plane
sweep), but it also refers to several methods based on partitioning.

I'm beginning to think that more than one of these ideas has merit.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-04-19 09:05:02 New sync commit mode remote_write
Previous Message Sandro Santilli 2012-04-19 07:56:26 Re: Gsoc2012 idea, tablesample