Re: [HACKERS] PoC: full merge join on comparison clause

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: [HACKERS] PoC: full merge join on comparison clause
Date: 2018-07-09 13:12:36
Message-ID: CAFjFpRd-yz4MpFLZfgG6+PQKw-aH=9nj_RAtxJQuC=eg3Y3UXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
> I will continue reviewing the patches.
>

Here are some more review comments

- * sort ordering for each merge key. The mergejoinable operator is an
- * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * sort ordering for each merge key. The mergejoinable operator is a
+ * comparison operator in the opfamily, and the two inputs are guaranteed to be

I think this prologue has to change substantially. At the beginning of the
prologue it explicitly mentions clauses like leftexpr = rightexpr. That
needs to be changed.

* ordered in either increasing or decreasing (respectively) order according

It looks like the order of inputs is constrained by the in-equality operator.
That too needs to be specified here.

* This allows us to obtain the needed comparison function from the opfamily.
@@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses,
Oid op_righttype;
Oid sortfunc;

+ if (parent->mj_Ineq_Present)
+ elog(ERROR, "inequality mergejoin clause must be the last one");
+

IIUC, this never happens. If it really happens, we have created a path which
can not be used practically. That should never happen. It will help to add a
comment here clarifying that situation.

+ bool have_inequality = have_inequality_mergeclause(mergeclauses);

There will be many paths created with different ordering of pathkeys. So,
instead of calling have_inequality_mergeclause() for each of those paths, it's
better to save this status in the path itself when creating the path.

/* Make a pathkey list with this guy first */
if (l != list_head(all_pathkeys))
+ {
+ if (have_inequality && l == list_tail(all_pathkeys))
+ /* Inequality merge clause must be the last, we can't
move it */
+ break;
+

I am kind of baffled by this change. IIUC the way we create different orderings
of pathkeys here, we are just rotating the pathkeys in circular order. This
means there is exactly one ordering of pathkeys where the pathkey corresponding
to the inequality clause is the last one. It's only that ordering which will be
retained and all other ordering will be discarded. Instead of that, I think we
should keep the pathkey corresponding to the inequality clause at the end (or
track in separately) and create different orderings of pathkeys by rotating
other pathkeys. This will allow us to cost the orderings as intended by this
fucntion.

/* Might have no mergeclauses */
if (nClauses == 0)
return NIL;

+ {
+ List *ineq_clauses = find_inequality_clauses(mergeclauses);
+
+ if (list_length(ineq_clauses) > 1)
+ return NIL;

Without this patch, when there is an inequality clause with FULL JOIN, we will
not create a merge join path because select_mergejoin_clauses() will set
mergejoin_allowed to false. This means that we won't call
sort_inner_and_outer(). I think this patch also has to do the same i.e. when
there are more than one inequality clauses, select_mergejoin_clauses() should
set mergejoin_allowed to false in case of a FULL JOIN since merge join
machinary won't be able to handle that case.

If we do that, we could arrange extra.mergeclause_list such that the inequality
clause is always at the end thus finding inequality clause would be easy.

Again, this is not full review, but I am diving deeper into the
patch-set and understanding it better. Sorry.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2018-07-09 13:30:43 Re: EXPLAIN of Parallel Append
Previous Message Fabien COELHO 2018-07-09 13:05:09 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors