Support run-time partition pruning for hash join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Support run-time partition pruning for hash join
Date: 2023-08-21 03:48:07
Message-ID: CAMbWs49atE64_JjqrF2+BBZvG-h7afL89wCghg_GmvOHPB_LbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If we have a hash join with an Append node on the outer side, something
like

Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t

We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.

This kind of partition pruning can be extended to happen across multiple
join levels. For instance,

Hash Join
Hash Cond: (pt.a = t2.a)
-> Hash Join
Hash Cond: (pt.a = t1.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2

We can compute the matching subnodes of the Append when building Hash
table for 't1' according to the join condition 'pt.a = t1.a', and when
building Hash table for 't2' according to join condition 'pt.a = t2.a',
and the final surviving subnodes would be their intersection.

Greenplum [1] has implemented this kind of partition pruning as
'Partition Selector'. Attached is a patch that refactores Greenplum's
implementation to make it work on PostgreSQL master. Here are some
details about the patch.

During planning:

1. When creating a hash join plan in create_hashjoin_plan() we first
collect information required to build PartitionPruneInfos at this
join, which includes the join's RestrictInfos and the join's inner
relids, and put this information in a stack.

2. When we call create_append_plan() for an appendrel, for each of the
joins we check if join partition pruning is possible to take place
for this appendrel, based on the information collected at that join,
and if so build a PartitionPruneInfo and add it to the stack entry.

3. After finishing the outer side of the hash join, we should have built
all the PartitionPruneInfos that can be used to perform join
partition pruning at this join. So we pop out the stack entry to get
the PartitionPruneInfos and add them to Hash node.

During executing:

When building the hash table for a hash join, we perform the partition
prunning for each row according to each of the JoinPartitionPruneStates
at this join, and store each result in a special executor parameter to
make it available to Append nodes. When executing an Append node, we
can directly use the pre-computed pruning results to skip subnodes that
cannot contain any matching rows.

Here is a query that shows the effect of the join partition prunning.

CREATE TABLE pt (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (0) TO (250);
CREATE TABLE pt_p2 PARTITION OF pt FOR VALUES FROM (250) TO (500);
CREATE TABLE pt_p3 PARTITION OF pt FOR VALUES FROM (500) TO (600);
INSERT INTO pt SELECT i, i % 25, to_char(i, 'FM0000') FROM
generate_series(0, 599) i WHERE i % 2 = 0;

CREATE TABLE t1 (a int, b int);
INSERT INTO t1 values (10, 10);

CREATE TABLE t2 (a int, b int);
INSERT INTO t2 values (300, 300);

ANALYZE pt, t1, t2;

SET enable_nestloop TO off;

explain (analyze, costs off, summary off, timing off)
select * from pt join t1 on pt.a = t1.a right join t2 on pt.a = t2.a;
QUERY PLAN
------------------------------------------------------------
Hash Right Join (actual rows=1 loops=1)
Hash Cond: (pt.a = t2.a)
-> Hash Join (actual rows=0 loops=1)
Hash Cond: (pt.a = t1.a)
-> Append (actual rows=0 loops=1)
-> Seq Scan on pt_p1 pt_1 (never executed)
-> Seq Scan on pt_p2 pt_2 (never executed)
-> Seq Scan on pt_p3 pt_3 (never executed)
-> Hash (actual rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t1 (actual rows=1 loops=1)
-> Hash (actual rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t2 (actual rows=1 loops=1)
(14 rows)

There are several points that need more consideration.

1. All the join partition prunning decisions are made in createplan.c
where the best path tree has been decided. This is not great. Maybe
it's better to make it happen when we build up the path tree, so that
we can take the partition prunning into consideration when estimating
the costs.

2. In order to make the join partition prunning take effect, the patch
hacks the empty-outer optimization in ExecHashJoinImpl(). Not sure
if this is a good practice.

3. This patch does not support parallel hash join yet. But it's not
hard to add the support.

4. Is it possible and worthwhile to extend the join partition prunning
mechanism to support nestloop and mergejoin also?

Any thoughts or comments?

[1] https://github.com/greenplum-db/gpdb

Thanks
Richard

Attachment Content-Type Size
v1-0001-Support-run-time-partition-pruning-for-hash-join.patch application/octet-stream 54.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2023-08-21 04:08:58 Re: Make --help output fit within 80 columns per line
Previous Message Amit Kapila 2023-08-21 03:20:32 Re: [PoC] pg_upgrade: allow to upgrade publisher node