Re: [HACKERS] PoC: full merge join on comparison clause

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: [HACKERS] PoC: full merge join on comparison clause
Date: 2018-07-18 13:58:58
Message-ID: CAFjFpRfAxa6CEKVA4E0mYXY+R+=2+o7H+Yk5h3PMfdRx5cTYvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 10, 2018 at 10:06 PM, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> I tried to fix the things you mentioned and improve the comments. Among
> other changes, there is now a description of how merge join works with
> inequalities at the top of nodeMergejoin.c. It also explains why we only
> support one inequality clause.

Thanks for the commit messages. I would use word "in-equality" instead
of "comparison" since equality is also a comparison.

>
> Some particular points:
>
> On 07/06/2018 04:01 PM, Ashutosh Bapat wrote:
>>
>> - StrategyNumber opstrategy = mergestrategies[iClause];
>> + StrategyNumber sort_strategy = mergestrategies[iClause];
>> - int op_strategy;
>> + int join_strategy;
>> I don't see a reason why should we change the name of variable here. These
>> are
>> operator strategies and there's no need to change their names. The name
>> change
>> is introducing unnecessary diffs.
>
>
> These variables have different meaning but their names differ only with an
> underscore. When I had to change this function, I made mistakes because of
> this. I'd keep the descriptive names to avoid further confusion. Should this
> be a separate patch?

No, 0001 suffice. But I am still not sure that the variable name
change is worth the trouble. Anyway, will leave this for a committer
to judge.

>
> This is just a cross-check for the planner. Added a comment. We should
> probably use a separate error code for internal errors as opposed to user
> errors, but I'm not sure if we have one, I see just elog(ERROR) being used
> everywhere.

elog(ERROR) is fine. Thanks for the comments.

- if (op_mergejoinable(opno, exprType(leftarg)) &&
- !contain_volatile_functions((Node *) clause))
- restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno);
+ if (!contain_volatile_functions((Node *) clause))
+ {
+ if (op_mergejoinable_equality(opno, exprType(leftarg)))

Why is this condition split. Also why is the change in the order of conditions?

+ {
+ restrictinfo->mergeopfamilies =
get_btree_equality_opfamilies(opno);
+ restrictinfo->is_mj_equality = true;

Comparing this with the original code, I think, is_mj_equality should be true
if restrictinfo->mergeopfamilies is not NIL. There is no way that a clause can
act as an equality clause when there are no families in which the operator is
an equality operator. If restrictinfo->mergeopfamilies can not be NIL here,
probably we should add an Assert and a bit of explanation as to why
is_mj_equality is true.

With this work the meaning of oprcanmerge (See pg_operator catalog and also
CREATE OPERATOR syntax) changes. Every btree operator can now be used to
perform a merge join. oprcanmerge however only indicates whether an operator is
an equality or not. Have you thought about that? Do we require to re-define
oprcanmerge?

+ *
+ * If the inequality clause is not the last one, or if there
are several
+ * of them, this algorithm doesn't work, because it is not possible to
+ * sort the inputs in such a way that given an outer tuple,
the matching
+ * inner tuples form a contiguous interval.

I think, it should be possible to use this technique with more than one
inequality clauses as long as all the operators require the input to be ordered
in the same direction and the clauses are ANDed. In that case the for a given
outer tuple the matching inner tuples form a contiguous interval.

I think it's better to straighten out these things before diving
further into the 2nd patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-07-18 14:08:37 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Tom Lane 2018-07-18 13:32:52 Re: why partition pruning doesn't work?