Re: Research/Implementation of Nested Loop Join optimization

From: "Manoel Henrique" <mhenriquesgbd(at)gmail(dot)com>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Research/Implementation of Nested Loop Join optimization
Date: 2008-07-23 20:47:22
Message-ID: 5b27a1ad0807231347o53a08dccj26bea6b04e541448@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying to
find a way to change the Nested Loop Join, It would be more like on
nodeNestloop.c when rescanning the inner plan, (second time scanning the
inner plan and so on) he`d change the scan direction, If the scan direction
was from first tuple to last tuple it would go backwards, if it was from
last to first it would go forward... The code I`m looking atm is from 8.3.1
, seems to have some kind of direction manager but doesn`t seems to be in
use.

--Manoel

On Wed, Jul 23, 2008 at 5:23 PM, Dann Corbit <DCorbit(at)connx(dot)com> wrote:

> When you install the source tree (e.g. in folder \postgresql-8.3.x) you
> will want to examine nodeMergejoin.c typically found in a path similar to
> this:
>
> \postgresql-8.3.x\src\backend\executor\nodeMergejoin.c
>
>
>
> Here are the comments from the version on my machine:
>
>
>
> /*
>
> * INTERFACE ROUTINES
>
> * ExecMergeJoin
> mergejoin outer and inner relations.
>
> * ExecInitMergeJoin
> creates and initializes run time states
>
> * ExecEndMergeJoin
> cleans up the node.
>
> *
>
> * NOTES
>
> *
>
> * Merge-join is done by joining the inner and
> outer tuples satisfying
>
> * join clauses of the form ((= outerKey
> innerKey) ...).
>
> * The join clause list is provided by the query
> planner and may contain
>
> * more than one (= outerKey innerKey) clause
> (for composite sort key).
>
> *
>
> * However, the query executor needs to know
> whether an outer
>
> * tuple is "greater/smaller" than an inner
> tuple so that it can
>
> * "synchronize" the two relations. For example,
> consider the following
>
> * relations:
>
> *
>
> * outer: (0 ^1
> 1 2 5 5 5 6 6 7) current tuple: 1
>
> * inner: (1 ^3
> 5 5 5 5 6) current tuple: 3
>
> *
>
> * To continue the merge-join, the executor
> needs to scan both inner
>
> * and outer relations till the matching tuples
> 5. It needs to know
>
> * that currently inner tuple 3 is "greater"
> than outer tuple 1 and
>
> * therefore it should scan the outer relation
> first to find a
>
> * matching tuple and so on.
>
> *
>
> * Therefore, rather than directly executing the
> merge join clauses,
>
> * we evaluate the left and right key
> expressions separately and then
>
> * compare the columns one at a time (see
> MJCompare). The planner
>
> * passes us enough information about the sort
> ordering of the inputs
>
> * to allow us to determine how to make the
> comparison. We may use the
>
> * appropriate btree comparison function, since
> Postgres' only notion
>
> * of ordering is specified by btree opfamilies.
>
> *
>
> *
>
> * Consider the above relations and suppose that
> the executor has
>
> * just joined the first outer "5" with the last
> inner "5". The
>
> * next step is of course to join the second
> outer "5" with all
>
> * the inner "5's". This requires repositioning
> the inner "cursor"
>
> * to point at the first inner "5". This is done
> by "marking" the
>
> * first inner 5 so we can restore the "cursor"
> to it before joining
>
> * with the second outer 5. The access method
> interface provides
>
> * routines to mark and restore to a tuple.
>
> *
>
> *
>
> * Essential operation of the merge join
> algorithm is as follows:
>
> *
>
> * Join {
>
> * get initial outer and inner
> tuples
> INITIALIZE
>
> * do forever {
>
> * while (outer
> != inner) {
> SKIP_TEST
>
> *
> if (outer < inner)
>
> *
> advance
> outer
> SKIPOUTER_ADVANCE
>
> *
> else
>
> *
> advance
> inner
> SKIPINNER_ADVANCE
>
> * }
>
> * mark inner
> position
> SKIP_TEST
>
> * do forever {
>
> *
> while (outer == inner) {
>
> *
> join
> tuples
> JOINTUPLES
>
> *
> advance inner position
> NEXTINNER
>
> *
> }
>
> *
> advance outer
> position
> NEXTOUTER
>
> *
> if (outer ==
> mark)
> TESTOUTER
>
> *
> restore inner position to mark TESTOUTER
>
> *
> else
>
> *
> break // return to top of outer loop
>
> * }
>
> * }
>
> * }
>
> *
>
> * The merge join operation is coded in the
> fashion
>
> * of a state machine. At each state, we do
> something and then
>
> * proceed to another state. This state is
> stored in the node's
>
> * execution state information and is preserved
> across calls to
>
> * ExecMergeJoin. -cim 10/31/89
>
> */
>
>
>
> *From:* pgsql-hackers-owner(at)postgresql(dot)org [mailto:
> pgsql-hackers-owner(at)postgresql(dot)org] *On Behalf Of *Manoel Henrique
> *Sent:* Wednesday, July 23, 2008 1:17 PM
> *To:* pgsql-hackers(at)postgresql(dot)org
> *Subject:* [HACKERS] Research/Implementation of Nested Loop Join
> optimization
>
>
>
> Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an
> Joins, and we`d like to implement an optimization on the Nested Loop Join,
> this optimization consists on while scanning the inner table, the iteration
> would go from up-down then backwards(down-up) to take advantage of the
> buffer pages in memory.
>
> We`d work with MaterialScan and only NestedLoop (we`re dropping all indexes
> and keys to make it this way). The research objective is to show some
> students how a DBMS works.
>
>
> Does PostgreSQL already works this way?
>
> Is it possible to implement such thing? Is it easy? how hard?
>
>
>
>
>
> Thank you in advance,
>
> Manoel Henrique Souza.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-23 20:51:42 Re: Postgres-R: internal messaging
Previous Message Alvaro Herrera 2008-07-23 20:44:30 Re: [PATCHES] GIN improvements