Re: Research/Implementation of Nested Loop Join optimization

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Manoel Henrique" <mhenriquesgbd(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Research/Implementation of Nested Loop Join optimization
Date: 2008-07-23 20:23:25
Message-ID: D425483C2C5C9F49B5B7A41F8944154701000FA1@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When you install the source tree (e.g. in folder \postgresql-8.3.x) you
will want to examine nodeMergejoin.c typically found in a path similar
to this:

\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

Here are the comments from the version on my machine:

/*

* INTERFACE ROUTINES

* ExecMergeJoin
mergejoin outer and inner relations.

* ExecInitMergeJoin
creates and initializes run time states

* ExecEndMergeJoin
cleans up the node.

*

* NOTES

*

* Merge-join is done by joining the inner
and outer tuples satisfying

* join clauses of the form ((= outerKey
innerKey) ...).

* The join clause list is provided by the
query planner and may contain

* more than one (= outerKey innerKey) clause
(for composite sort key).

*

* However, the query executor needs to know
whether an outer

* tuple is "greater/smaller" than an inner
tuple so that it can

* "synchronize" the two relations. For
example, consider the following

* relations:

*

* outer: (0
^1 1 2 5 5 5 6 6 7) current tuple: 1

* inner: (1
^3 5 5 5 5 6) current tuple: 3

*

* To continue the merge-join, the executor
needs to scan both inner

* and outer relations till the matching
tuples 5. It needs to know

* that currently inner tuple 3 is "greater"
than outer tuple 1 and

* therefore it should scan the outer
relation first to find a

* matching tuple and so on.

*

* Therefore, rather than directly executing
the merge join clauses,

* we evaluate the left and right key
expressions separately and then

* compare the columns one at a time (see
MJCompare). The planner

* passes us enough information about the
sort ordering of the inputs

* to allow us to determine how to make the
comparison. We may use the

* appropriate btree comparison function,
since Postgres' only notion

* of ordering is specified by btree
opfamilies.

*

*

* Consider the above relations and suppose
that the executor has

* just joined the first outer "5" with the
last inner "5". The

* next step is of course to join the second
outer "5" with all

* the inner "5's". This requires
repositioning the inner "cursor"

* to point at the first inner "5". This is
done by "marking" the

* first inner 5 so we can restore the
"cursor" to it before joining

* with the second outer 5. The access method
interface provides

* routines to mark and restore to a tuple.

*

*

* Essential operation of the merge join
algorithm is as follows:

*

* Join {

* get initial outer and
inner tuples
INITIALIZE

* do forever {

* while
(outer != inner) {
SKIP_TEST

*
if (outer < inner)

*
advance outer
SKIPOUTER_ADVANCE

*
else

*
advance inner
SKIPINNER_ADVANCE

* }

* mark inner
position
SKIP_TEST

* do forever
{

*
while (outer == inner) {

*
join tuples
JOINTUPLES

*
advance inner position
NEXTINNER

*
}

*
advance outer position
NEXTOUTER

*
if (outer == mark)
TESTOUTER

*
restore inner position to mark TESTOUTER

*
else

*
break // return to top of outer loop

* }

* }

* }

*

* The merge join operation is coded in the
fashion

* of a state machine. At each state, we do
something and then

* proceed to another state. This state is
stored in the node's

* execution state information and is
preserved across calls to

* ExecMergeJoin. -cim 10/31/89

*/

From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Manoel Henrique
Sent: Wednesday, July 23, 2008 1:17 PM
To: pgsql-hackers(at)postgresql(dot)org
Subject: [HACKERS] Research/Implementation of Nested Loop Join
optimization

Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an
Joins, and we`d like to implement an optimization on the Nested Loop
Join, this optimization consists on while scanning the inner table, the
iteration would go from up-down then backwards(down-up) to take
advantage of the buffer pages in memory.

We`d work with MaterialScan and only NestedLoop (we`re dropping all
indexes and keys to make it this way). The research objective is to show
some students how a DBMS works.

Does PostgreSQL already works this way?

Is it possible to implement such thing? Is it easy? how hard?

Thank you in advance,

Manoel Henrique Souza.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-07-23 20:26:52 Re: Postgres-R: internal messaging
Previous Message Tom Lane 2008-07-23 20:20:08 Re: pltcl_*mod commands are broken on Solaris 10