Re: Proposal for Merge Join for Non '=' Operators

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: Hadi Moshayedi <hadi(at)moshayedi(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Proposal for Merge Join for Non '=' Operators
Date: 2014-05-02 09:28:33
Message-ID: 4205E661176A124FAF891E0A6BA91352663061AD@szxeml509-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 April 2014 13:28, Hadi Moshayedi Wrote,

>This looks like a great improvement. Repeating Nicolas's question, do you have a real-world example of such joins?

I can think of some scenario where, user need to self-join and find the comparison with other tuples, For example, list down all the employee which has less salary compare to others employees and count of the employees who are earning more than that emp. Like query given below

“select ta.emp_name, count(*) from t1 as ta, t1 as tb where ta.emp_salary<tb.emp_salary group by ta.emp_name;”

>In my experience, I see more queries like "self-join table A and table B where A.time BETWEEN B.time - '1 week' and B.time", similar to what Nicolas and Tom mentioned. As an example, "count users who placed an order in the week following their registration".

Currently I have implemented very basic POC which can work only for a < b query, I think actual patch can be enhanced for these type of queries also.

>Can you send a patch so we can also try it?

Patch is attached in the mail, but for testing we need to take care of some points

1. Patch is implemented only for a<b type of queries (only for table with one integer field, this can be modified in create_nestloop_plan if needed, I have written for basic test with integer).

2. What changes are done

There is no changes done in planner cost calculation, so hack is put while generating the plan.

IF planner has selected NLJ plan, and enable material is set to off (this is the hint to select special Merge Join)

Then add sort node above left and right tree for NLJ.

3. So if you want to test with normal NLJ no need to change anything, and if you want to test using this merge join just run ‘set enable_material=off’;

postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=396625.51..396625.52 rows=1 width=0)
-> Nested Loop (cost=0.00..375758.83 rows=8346672 width=0)
Join Filter: (ta.a < tb.a)
-> Seq Scan on t1 ta (cost=0.00..73.04 rows=5004 width=4)
-> Materialize (cost=0.00..98.06 rows=5004 width=4)
-> Seq Scan on t1 tb (cost=0.00..73.04 rows=5004 width=4)
Planning time: 0.291 ms
(7 rows)

Now For enabling this merge Join
postgres=# set enable_material=off;
SET
postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=699432.08..699432.09 rows=1 width=0)
-> Nested Loop (cost=0.00..678565.40 rows=8346672 width=0)
Join Filter: (ta.a < tb.a)
-> Sort (cost=380.51..393.02 rows=5004 width=4)
Sort Key: ta.a
-> Seq Scan on t1 ta (cost=0.00..73.04 rows=5004 width=4)
-> Sort (cost=380.51..393.02 rows=5004 width=4)
Sort Key: tb.a
-> Seq Scan on t1 tb (cost=0.00..73.04 rows=5004 width=4)
Planning time: 0.286 ms
(10 rows)

Thanks & Regards,
Dilip Kumar

Attachment Content-Type Size
merge_join_nonequal.patch application/octet-stream 7.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2014-05-02 13:53:47 Re: includedir_internal headers are not self-contained
Previous Message Amit Langote 2014-05-02 07:30:54 Re: Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)