d25ea01275 and partitionwise join

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: d25ea01275 and partitionwise join
Date: 2019-07-02 09:28:58
Message-ID: CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

I think an assumption of d25ea01275 breaks partitionwise join. Sorry
it took me a while to report it.

In https://postgr.es/m/8168.1560446056@sss.pgh.pa.us, Tom wrote:
> I poked into this and found the cause. For the sample query, we have
> an EquivalenceClass containing the expression
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1)
> where each of the Vars belongs to an appendrel parent.
> add_child_rel_equivalences() needs to add expressions representing the
> transform of that to each child relation. That is, if the children
> of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3
> are C1 and C2, what we'd like to add are the expressions
> COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1)
> COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1)
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1)
> COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1)
> However, what it's actually producing is additional combinations for
> each appendrel after the first, because each call also mutates the
> previously-added child expressions. So in this example we also get
> COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1)
> COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1)
> COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1)
> COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1)
> COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1)
> COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1)
> With two appendrels involved, that's O(N^2) expressions; with
> three appendrels, more like O(N^3).
>
> This is by no means specific to FULL JOINs; you could get the same
> behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d".
>
> These extra expressions don't have any use, since we're not
> going to join the children directly to each other.

...unless partition wise join thinks they can be joined. Partition
wise join can't handle 3-way full joins today, but only because it's
broken itself when trying to match a full join clause to the partition
key due to one side being a COALESCE expression. Consider this
example query:

-- p is defined as:
-- create table p (a int) partition by list (a);
-- create table p1 partition of p for values in (1);
-- create table p2 partition of p for values in (2);
explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sort (cost=16416733.32..16628145.85 rows=84565012 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Merge Full Join (cost=536957.40..1813748.77 rows=84565012 width=4)
Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t4.a
-> Append (cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
-> Materialize (cost=536546.83..553128.21 rows=3316275 width=12)
-> Sort (cost=536546.83..544837.52 rows=3316275 width=12)
Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a))
-> Merge Full Join (cost=14254.85..64024.48
rows=3316275 width=12)
Merge Cond: (t3.a = (COALESCE(t1.a, t2.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t3.a
-> Append (cost=0.00..96.50
rows=5100 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=13844.29..14169.41
rows=130050 width=8)
Sort Key: (COALESCE(t1.a, t2.a))
-> Merge Full Join
(cost=821.13..2797.38 rows=130050 width=8)
Merge Cond: (t1.a = t2.a)
-> Sort (cost=410.57..423.32
rows=5100 width=4)
Sort Key: t1.a
-> Append
(cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1
t1 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2
t1_1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=410.57..423.32
rows=5100 width=4)
Sort Key: t2.a
-> Append
(cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1
t2 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2
t2_1 (cost=0.00..35.50 rows=2550 width=4)

-- turn on enable_partitionwise_join
set enable_partitionwise_join to on;
explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sort (cost=16385259.94..16596672.47 rows=84565012 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Merge Full Join (cost=505484.02..1782275.39 rows=84565012 width=4)
Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t4.a
-> Append (cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
-> Materialize (cost=505073.45..521654.83 rows=3316275 width=12)
-> Sort (cost=505073.45..513364.14 rows=3316275 width=12)
Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a))
-> Merge Full Join (cost=7653.92..32551.10
rows=3316275 width=12)
Merge Cond: (t3.a = (COALESCE(t1.a, t2.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t3.a
-> Append (cost=0.00..96.50
rows=5100 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=7243.35..7405.91 rows=65024 width=8)
Sort Key: (COALESCE(t1.a, t2.a))
-> Result (cost=359.57..2045.11
rows=65024 width=8)
-> Append
(cost=359.57..2045.11 rows=65024 width=8)
-> Merge Full Join
(cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (t1.a = t2.a)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t1.a
-> Seq Scan
on p1 t1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t2.a
-> Seq Scan
on p1 t2 (cost=0.00..35.50 rows=2550 width=4)
-> Merge Full Join
(cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (t1_1.a = t2_1.a)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t1_1.a
-> Seq Scan
on p2 t1_1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t2_1.a
-> Seq Scan
on p2 t2_1 (cost=0.00..35.50 rows=2550 width=4)

See how it only managed to use partition wise join up to 2-way join,
but gives up at 3-way join and higher, because the join condition
looks like this: t3.a = (COALESCE(t1.a, t2.a). When building the join
relation (t1, t2, t3) between (t3) and (t1, t2), it fails to see that
COALESCE(t1.a, t2.a) actually matches the partition key of (t1, t2).
When I fix the code that does the matching and run with merge joins
disabled, I can get a plan where the whole 4-way join is partitioned:

explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=831480.11..1859235.87 rows=8808720 width=4)
Workers Planned: 2
-> Sort (cost=830480.09..841490.99 rows=4404360 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Parallel Append (cost=202.12..224012.93 rows=4404360 width=4)
-> Hash Full Join (cost=202.12..201991.13 rows=5285232 width=4)
Hash Cond: (COALESCE(COALESCE(t1.a, t2.a), t3.a) = t4.a)
-> Hash Full Join (cost=134.75..15904.32
rows=414528 width=12)
Hash Cond: (COALESCE(t1.a, t2.a) = t3.a)
-> Hash Full Join (cost=67.38..1247.18
rows=32512 width=8)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on p1 t1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t2
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50
rows=2550 width=4)
-> Hash Full Join (cost=202.12..201991.13 rows=5285232 width=4)
Hash Cond: (COALESCE(COALESCE(t1_1.a, t2_1.a),
t3_1.a) = t4_1.a)
-> Hash Full Join (cost=134.75..15904.32
rows=414528 width=12)
Hash Cond: (COALESCE(t1_1.a, t2_1.a) = t3_1.a)
-> Hash Full Join (cost=67.38..1247.18
rows=32512 width=8)
Hash Cond: (t1_1.a = t2_1.a)
-> Seq Scan on p2 t1_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t2_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
(31 rows)

But with merge joins enabled:

explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
ERROR: could not find pathkey item to sort

That's because, there's no child COALESCE(t1_1.a, t2_1.a) expression
in the EC that contains COALESCE(t1.a, t2.a), where t1_1 and t2_1
represent the 1st partition of t1 and t2, resp. The problem is that
add_child_rel_equivalences(), as of d25ea01275, only adds the
following child expressions of COALESCE(t1.a, t2.a):

-- when translating t1
COALESCE(t1_1.a, t2.a)
COALESCE(t1_2.a, t2.a)
-- when translating t2
COALESCE(t1.a, t2_1.a)
COALESCE(t1.a, t2_2.a)

whereas previously, the following would be added too when translating t2:

COALESCE(t1_1.a, t2_1.a)
COALESCE(t1_1.a, t2_2.a)
COALESCE(t1_2.a, t2_1.a)
COALESCE(t1_2.a, t2_2.a)

Note that of those, only COALESCE(t1_1.a, t2_1.a) and COALESCE(t1_2.a,
t2_2.a) are interesting, because partition wise join will only ever
consider pairs (t1_1, t2_1) and (t1_2, t2_2) to be joined.

We can get the needed child expressions and still avoid the
combinatorial explosion in the size of resulting EC members list if we
taught add_child_rel_equivalences() to only translate ECs that the
input parent relation is capable of producing. So, COALESCE(t1.a,
t2.a) will not be translated if the input relation is only (t1) or
(t2), that is, when called from set_append_rel_size(). Instead it
would be translated if it's passed the joinrel (t1, t2). IOW, teach
build_child_join_rel() to call add_child_rel_equivalences(), which
I've tried to implement in the attached.

I have attached two patches.

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)
0002 - fix to translate multi-relation EC members correctly

Thanks,
Amit

Attachment Content-Type Size
0001-Fix-partitionwise-join-code-to-handle-FULL-OUTER-JOI.patch application/octet-stream 31.0 KB
0002-Translate-multi-relation-EC-members-in-a-separate-pa.patch application/octet-stream 21.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-02 10:12:40 Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message Alexander Korotkov 2019-07-02 09:16:14 Re: Support for jsonpath .datetime() method