Re: Asymmetric partition-wise JOIN

From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kohei KaiGai <kaigai(at)heterodb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2021-04-30 03:10:19
Message-ID: de20c214-1b26-b94a-081b-5b9aad3f490c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/30/20 7:43 PM, Anastasia Lubennikova wrote:
> This entry was inactive during this CF, so I've marked it as returned
> with feedback. Feel free to resubmit an updated version to a future
> commitfest.
I return the patch to commitfest. My current reason differs from reason
of origin author.
This patch can open a door for more complex optimizations in the
partitionwise join push-down technique.
I mean, we can push-down join not only of two partitioned tables with
the same partition schema, but a partitioned (sharded) table with an
arbitrary subplan that is provable independent of local resources.

Example:

CREATE TABLE p(a int) PARTITION BY HASH (a);
CREATE TABLE p1 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE p2 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE p3 PARTITION OF p FOR VALUES WITH (MODULUS 3, REMAINDER 2);

SELECT * FROM p, (SELECT * FROM generate_series(1,2) AS a) AS s
WHERE p.a=s.a;

Hash Join
Hash Cond: (p.a = a.a)
-> Append
-> Seq Scan on p1 p_1
-> Seq Scan on p2 p_2
-> Seq Scan on p3 p_3
-> Hash
-> Function Scan on generate_series a

But with asymmetric join feature we have the plan:

Append
-> Hash Join
Hash Cond: (p_1.a = a.a)
-> Seq Scan on p1 p_1
-> Hash
-> Function Scan on generate_series a
-> Hash Join
Hash Cond: (p_2.a = a.a)
-> Seq Scan on p2 p_2
-> Hash
-> Function Scan on generate_series a
-> Hash Join
Hash Cond: (p_3.a = a.a)
-> Seq Scan on p3 p_3
-> Hash
-> Function Scan on generate_series a

In the case of FDW-sharding it means that if we can prove that the inner
relation is independent from the execution server, we can push-down
these joins and execute it in parallel.

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-30 03:12:47 Re: Replication slot stats misgivings
Previous Message David Rowley 2021-04-30 02:50:33 Re: Result Cache node shows per-worker info even for workers not launched