Re: [HACKERS] PoC: full merge join on comparison clause

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: [HACKERS] PoC: full merge join on comparison clause
Date: 2018-07-06 13:01:08
Message-ID: CAFjFpRdPUCXOe9bcAQWnRBTU1jAtPPa8fMJrn6oAtttBj=HniA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
I have started reviewing these patches. I haven't grasped the design
yet. But here are some comments on the first patch.

- clauses = (MergeJoinClause) palloc0(nClauses *
sizeof(MergeJoinClauseData));
+ parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses *
sizeof(MergeJoinClauseData));

crosses 80 characters.

- StrategyNumber opstrategy = mergestrategies[iClause];
+ StrategyNumber sort_strategy = mergestrategies[iClause];
- int op_strategy;
+ int join_strategy;
I don't see a reason why should we change the name of variable here. These are
operator strategies and there's no need to change their names. The name change
is introducing unnecessary diffs.

+ clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args),
(PlanState *) parent);
+ clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args),
(PlanState *) parent);

cross 80 characters.

/*
@@ -378,20 +375,29 @@ MJEvalInnerValues(MergeJoinState *mergestate,
TupleTableSlot *innerslot)
return result;
}

+/* Tuple comparison result */
+typedef enum
+{
+ MJCR_NextInner = 1,
+ MJCR_NextOuter = -1,
+ MJCR_Join = 0
+} MJCompareResult;
+
/*
* MJCompare
*
- * Compare the mergejoinable values of the current two input tuples
- * and return 0 if they are equal (ie, the mergejoin equalities all
- * succeed), >0 if outer > inner, <0 if outer < inner.
+ * Compare the mergejoinable values of the current two input tuples.
+ * If they are equal, i.e., the mergejoin equalities all succeed,
+ * return MJCR_Join, if outer > inner, MJCR_NextInner, and else
+ * MJCR_NextOuter.
*
* MJEvalOuterValues and MJEvalInnerValues must already have been called
* for the current outer and inner tuples, respectively.
*/
-static int
+static MJCompareResult
MJCompare(MergeJoinState *mergestate)
{

I am not sure about this change as well. MJCompare()'s job is to compare given
keys in the two tuples and return the comparison result. The result was used as
it is to decide which side to advance in an equality based merge join. But for
inequality based merge join the result needs to be interpreted further. I think
we should write a wrapper around MJCompare which interprets the result rather
than changing MJCompare itself. OR at least change the name of MJCompare. The
first option is better in case we use MJCompare for purposes other than merge
join in future. I am not sure what those could be, but say a merge based union
or something like that.

/*
* Sweep through the existing EquivalenceClasses looking for matches to
@@ -273,7 +274,7 @@ process_equivalence(PlannerInfo *root,
/*
* A "match" requires matching sets of btree opfamilies. Use of
* equal() for this test has implications discussed in the comments
- * for get_mergejoin_opfamilies().
+ * for get_equality_opfamilies().

I think we should leave mergejoin word in there or at least indicate that these
are btree opfamilies so that we don't confuse it with hash equality operator
families.

It will be good if you can write something about why these changes are
required in the file. If you are using git format-patch, you could
write a commit message that gets added to the patch. That way, it
leaves there for anybody to review.

I am having a difficult time reading the next patch. There are various
changes in the second patch, which I don't understand the reason
behind. I think some comments will help, in as commit message or in
the code.

I will continue reviewing the patches.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-07-06 13:08:21 Re: Generating partitioning tuple conversion maps faster
Previous Message Antonin Houska 2018-07-06 12:58:40 Re: Push down Aggregates below joins