PoC: full merge join on comparison clause

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PoC: full merge join on comparison clause
Date: 2017-05-12 11:09:32
Message-ID: b31e1a2d-5ed2-cbca-649e-136f1a7c4c31@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

As you know, at this time Postgres cannot perform a full join on a
comparison clause. For example, if we have two tables with numeric
columns and run a query like 'select * from t1 full join t2 on t1.a >
t2.a', we get an error: "FULL JOIN is only supported with merge-joinable
or hash-joinable join conditions". Such queries are legitimate SQL and
sometimes arise when porting applications from different DBMS, so it
would be good to support them in Postgres. They can be rewritten as
union of right and left joins, but it requires manual work and runs
somewhat slower (see the benchmark at the end of the letter). This
proof-of-concept patch explores the possibility of performing such
queries as merge joins.

Consider the following example where outer and inner relations are in
ascending order, and we are asked to return outer tuples that are
greater than inner.
outer > inner
outer tuple - 6 4 - marked tuple
7 5
8 6 - inner tuple
8 7

The main difference from normal merge join is that we do not need to
advance the marked tuple. This behavior can be implemented with some
simple changes to the function that compares inner and outer tuples.
However, for the join clause 'outer < inner' we would have to advance
the marked tuple, which would require adding a new state to the merge
join executor node. We do not do this. Instead, at the path creation
stage, we make sure that the particular combination of sorting order and
join clause allows us to perform merge join the simple way.

The optimizer requires some other changes to support these joins.
Currently, it uses the same clauses to generate equivalence classes and
to perform merge joins. This patch has to separate these two uses.
Clauses that correspond to a btree equality operator are used to
construct equivalence classes; the operator families for these clauses
are recorded in the 'equivopfamilies' field of RestrictInfo struct.
Clauses that correspond to btree equality or comparison are used to
perform merge joins, and have their operator families recorded in the
'mergeopfamilies'.

The optimizer also has to check whether the particular join clause list
can be used for merge join, and ensure that it is compatible with
inner/outer path ordering. These checks are performed by
'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'.

There is an important unsolved problem in this patch. When generating
index paths for base relations, the optimizer tries to use only one scan
direction to limit the number of paths. This direction might not be
suitable for a given join clause, and the input path will have to be
sorted. We could generate paths for both directions, but this was
specifically removed for optimization (SHA 834ddc62 by Tom Lane,
10/27/2007 09:45 AM).

For inner joins one would expect the merge join to be slower than the
nested loop, because it has more complex code paths, and indeed this can
be seen on simple benchmarks (see the end of the letter). Costs should
be revised further to reflect this difference.

I would be glad to hear your opinion on this approach.

Some benchmarks:

===== Full join vs union of left and right joins
========================================

test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a
union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Append (cost=809.69..70703.19 rows=3340000 width=8) (actual
time=8.336..1195.534 rows=5007546 loops=1)
-> Merge Left Join (cost=809.69..34230.49 rows=3333333 width=8)
(actual time=8.335..920.442 rows=5007537 loops=1)
Merge Cond: (t1.a > t4.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=809.39..834.39 rows=10000 width=4) (actual
time=8.300..356.821 rows=5007538 loops=1)
Sort Key: t4.a
Sort Method: quicksort Memory: 931kB
-> Seq Scan on t4 (cost=0.00..145.00 rows=10000
width=4) (actual time=0.019..2.533 rows=10000 loops=1)
-> Nested Loop Anti Join (cost=0.28..3072.71 rows=6667 width=8)
(actual time=4.685..35.421 rows=9 loops=1)
-> Seq Scan on t4 t4_1 (cost=0.00..145.00 rows=10000
width=4) (actual time=0.010..0.656 rows=10000 loops=1)
-> Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10
rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=10000)
Index Cond: (a > t4_1.a)
Heap Fetches: 971
Planning time: 1.414 ms
Execution time: 1324.985 ms
(16 rows)

test1=# explain analyze select * from t4 full join t1 on t4.a < t1.a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Merge Full Join (cost=809.66..34230.49 rows=3333333 width=8) (actual
time=8.351..914.590 rows=5007546 loops=1)
Merge Cond: (t1.a > t4.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.035..0.368 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=809.39..834.39 rows=10000 width=4) (actual
time=8.309..347.851 rows=5007546 loops=1)
Sort Key: t4.a
Sort Method: quicksort Memory: 931kB
-> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4)
(actual time=0.020..2.563 rows=10000 loops=1)
Planning time: 1.083 ms
Execution time: 1044.869 ms
(10 rows)

=== Merge vs nested loop ===========================================

test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.28..944713.00 rows=33333333 width=8) (actual
time=0.055..8718.840 rows=50014145 loops=1)
-> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual
time=0.019..6.428 rows=100000 loops=1)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..6.10 rows=333
width=4) (actual time=0.003..0.050 rows=500 loops=100000)
Index Cond: (a >= t5.a)
Heap Fetches: 9147995
Planning time: 2.209 ms
Execution time: 9942.176 ms
(7 rows)

test1=# set enable_mergejoin TO on;
SET
test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=9748.54..343618.88 rows=33333333 width=8) (actual
time=35.491..9281.482 rows=50014145 loops=1)
Merge Cond: (t1.a >= t5.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.027..0.769 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual
time=35.458..3906.652 rows=50014145 loops=1)
Sort Key: t5.a
Sort Method: quicksort Memory: 8541kB
-> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.013..8.570 rows=100000 loops=1)
Planning time: 2.368 ms
Execution time: 10530.356 ms
(10 rows)

--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
full-merge-join-v1.patch text/x-diff 47.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-05-12 11:13:27 Re: WITH clause in CREATE STATISTICS
Previous Message Beena Emerson 2017-05-12 11:03:17 Re: Adding support for Default partition in partitioning