Re: [HACKERS] PoC: full merge join on comparison clause

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: [HACKERS] PoC: full merge join on comparison clause
Date: 2018-07-30 17:13:21
Message-ID: 9de15ac8-10b2-8569-4683-002db9131771@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

El 18/07/18 a las 16:58, Ashutosh Bapat escribió:
>
> Thanks for the commit messages. I would use word "in-equality" instead
> of "comparison" since equality is also a comparison.

Fixed.

> Comparing this with the original code, I think, is_mj_equality should be true
> if restrictinfo->mergeopfamilies is not NIL.

My mistake, fixed.

> With this work the meaning of oprcanmerge (See pg_operator catalog and also
> CREATE OPERATOR syntax) changes. Every btree operator can now be used to
> perform a merge join. oprcanmerge however only indicates whether an operator is
> an equality or not. Have you thought about that? Do we require to re-define
> oprcanmerge?

For now we can test with old oprcanmerge meaning, not to bump the
catalog version. Merge join needs only BTORDER_PROC function, which is
required for btree opfamilies. This means that it should be always
possible to merge join on operators that correspond to standard btree
strategies. We could set oprcanmerge to true for all built-in btree
comparison operators, and leave the possibility to disable it for custom
operators.

> I think, it should be possible to use this technique with more than one
> inequality clauses as long as all the operators require the input to be ordered
> in the same direction and the clauses are ANDed. In that case the for a given
> outer tuple the matching inner tuples form a contiguous interval.

Consider a table "t(a int, b int)", the value of each column can be 1,
2, 3, 4 and the table contains all possible combinations. If merge
condition is "a < 2 and b < 2", for each of the four possible sorting
directions, the result set won't be contiguous. Generally speaking, this
happens when we have several groups with the same value of first column,
and the first column matches the join condition. But inside each group,
for some rows the second column doesn't match.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0002-Inequality-merge-join-v10.patch text/x-patch 49.3 KB
0001-Preparatory-refactoring-v10.patch text/x-patch 25.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-07-30 17:29:31 Re: Adding a note to protocol.sgml regarding CopyData
Previous Message Robert Haas 2018-07-30 17:10:58 Re: request for new parameter for disable promote (slave only mode)