Re: Proposal for Merge Join for Non '=' Operators

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for Merge Join for Non '=' Operators
Date: 2014-04-10 08:51:25
Message-ID: 4205E661176A124FAF891E0A6BA913526595FA87@SZXEML507-MBS.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09 April 2014 19:14, Tom Lane Wrote:
>
> I'm not following how this would help much. Consider "a.x < b.y" where
> we know that the inputs are sorted by x/y. I think you are saying that
> for a given b row, we could scan forward from the beginning of a, but
> stop as soon as we find a row with a.x >= b.y. Great ... but instead
> of comparing M*N rows, we are comparing M*N/2 rows. So it's still
> O(N^2), though with a slightly smaller constant. Is this going to be
> worth the added time to sort the inputs?
>
> It's slightly more promising for range constraints (that is, "a.x
> between b.y and b.z") but even then you have to assume that the ranges
> are narrow, at which point a nestloop join using an inner indexscan on
> a.x might do about as well.
>
> So personally, I suspect there's a reason why this isn't a standard
> join algorithm already. If you want to try it, I'd suggest that you
> focus on getting to where you can do some performance testing ASAP,
> without doing much code polishing or worrying about teaching the
> planner how to cost it.

As per your suggestion have done quick hack and done the performance testing for one specific scenario.

I shall perform some more test, for that I need to do some more hack in the code and I will post them soon..

Test Scenario:
Create table t1 (a int, b int);
Create table t2 (a int, b int);

Random record inserted in t1 and t2, as per attached files. (10K records are inserted in both the tables)

Performance is taken for the query : select count(*) from t1,t2 where t1.b < t2.b;

Test Result:
Nest Loop Join : Time: 36038.842 ms
Merge Join : Time: 19774.975 ms

Number of record selected: 42291979

Thanks & Regards,
Dilip

Attachment Content-Type Size
t2.sql application/octet-stream 356.1 KB
t1.sql application/octet-stream 358.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-04-10 09:21:16 Re: WAL replay bugs
Previous Message sachin kotwal 2014-04-10 07:52:42 Re: WAL replay bugs