Re: Proposal for Merge Join for Non '=' Operators

From: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for Merge Join for Non '=' Operators
Date: 2014-04-09 10:55:59
Message-ID: 4205E661176A124FAF891E0A6BA913526595F4CA@SZXEML507-MBS.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 09 April 2014 13:31, Nicolas Barbier Wrote
> Do you have a real-world example use case of such joins, to offset the
> extra planner time that will likely have to be paid (even for queries
> for which the functionality ends up not being used)?
>
> I guess there might be queries that join on “values that are not too
> far apart” or something, but as those cases (there typically not being
> a lot of “inner” rows that join with each “outer” row) are probably
> executed efficiently using a nested loop + index scan, I don’t see the
> point (yet). Are you aiming for the case where the inner relation is
> difficult to compute and cannot be accessed using an index scan?

I think this will be more useful when Both the relation are Big and are of almost equal size.

Here we can compare the cost of existing methods and new approach..

For such case planner can select
Either (1). NLJ [Outer(seq scan) JOIN Inner (Materialize))
OR (2). NLJ [Outer(seq scan) JOIN Inner (Index)) As you mentioned.

In approach (1), cost will be:
NLJ Cost : Outer Tuple * Inner Tuple
+ I/O cost : Seq Page Cost (Inner rel pages + outer rel pages).

* And relation can be too big for materialization.

In approach (2), Cost will be:
NLJ Cost : OuterTuple * Inner Tuple in Path { Inner Tuple in Path -> Join selectivity * Number of inner tuple}
+ I/O Cost : OuterTuple * Index Rescan Cost { Index Rescan Cost will depend upon (how many pages fetched in scan)}

* This will be costly because I/O cost will increase for doing multiple index rescan(for each outer).

In New Approach(3) Cost will be:
(since here for each outer tuple we need not to scan complete Inner Tuple.)
MJ Cost : Outer Tuple * Inner Tuple in Path + (every outer tuple need to scan some tuple before reach to qualifying tuple)
+ I/O Cost : Index Scan Cost {Only one scan}

So for This Best case cost will be : MJ Cost : Outer Tuple * Inner Tuple in Path + I/O Cost : Index Scan Cost
Worst case cost will be: MJ Cost : Outer Tuple * Inner Tuple + I/O Cost : Index Scan Cost

So for many case approach(3) can be cheaper, that can be detected by planner cost calculation.

> > selecting this new cost calculation can be implemented in planner
>
> Hmm. Of course, the difficult part will be adding support for this in
> the planner, but you don’t seem to have any plans for implementing that?
>

Yes, I have plan to implement this part, but it's not completed yet.

Thanks & Regards,
Dilip

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2014-04-09 11:23:30 Re: Pointer to structure in ECPG
Previous Message Michael Meskes 2014-04-09 10:29:05 Re: Pointer to structure in ECPG