Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2018-07-15 17:43:43
Message-ID: CA+q6zcVf2RtHjRZ9wuhUqvBmXdq18RWSwtG+a9eo+xCw8c53uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, 28 Jun 2018 at 07:54, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> On 2018/06/27 22:21, Ashutosh Bapat wrote:
> > On Wed, Jun 27, 2018 at 12:28 PM, Amit Langote
> >> Ah, okay. I thought of reporting this because I felt the errors may have
> >> to do with changes to the related code in HEAD between May 14 when you
> >> last posted the patches and today that you may need to account for in you
> >> patches. For instance, there are many diffs like this:
> >>
> >> Looks like the Result node on top of Append is no longer there after
> >> applying your patch.
> >
> > Yes. They are coming because of a commit which removed Result node on
> > top of an Append node. I don't remember exactly which.
> >
> > I wouldn't worry about those diffs at this time. As I have mentioned
> > in earlier mails, the expected output from 0006 is quite large and is
> > not supposed to be committed. So, I don't see much value in fixing the
> > plans in that output.
> >
> > Do you see that as a hindrance in reviewing the code changes and tests in 0005?
>
> I think not. I'll ignore 0006 for now and focus on other patches.

Hi,

Sorry for my irregular reviews. Unfortunately, the patch need to be rebased
again. In the meantime I have few more commentaries about range_bounds_merge:

* From what I see partition_range_bound_cmp is executed on the same bounds few
times to find whether they are overlapping and during the range merging, is
it necessary? Probably it would be enough just to compare current ranges once
per iteration.

* Just to clarify - the iterating through all the partitions, is it the best
way of finding matching ranges? Correct me if I'm wrong, but from what I see
in the comments for PartitionBoundInfoData, all the ranges are arranged in
increasing order - maybe then it's possible to use binary search for finding
a matching range?

* I'm not sure why in this case partition wise join is not applied? Maybe I'm
wrong, but I was under the impression that it should work here

=# \d+ test1;
Table "public.test1"
Column | Type | Collation | Nullable | Default | Storage |
--------+---------+-----------+----------+---------+----------+
data | jsonb | | | | extended |
id | integer | | | | plain |
Partition key: RANGE (id)
Indexes:
"test1_idx" btree (id)
Partitions: test11 FOR VALUES FROM (0) TO (100),
test12 FOR VALUES FROM (100) TO (200),
test13 FOR VALUES FROM (200) TO (300)

=# \d+ test2;
Table "public.test2"
Column | Type | Collation | Nullable | Default | Storage |
--------+---------+-----------+----------+---------+----------+
data | jsonb | | | | extended |
id | integer | | | | plain |
Partition key: RANGE (id)
Indexes:
"test2_idx" btree (id)
Partitions: test21 FOR VALUES FROM (10) TO (110),
test22 FOR VALUES FROM (110) TO (210),
test23 FOR VALUES FROM (210) TO (310)

=# set enable_partitionwise_join to true;

=# explain analyze select * from test1 t1 inner join test2 t2 using (id);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=3.25..6.56 rows=9 width=54)
(actual time=0.082..0.105 rows=3 loops=1)
Hash Cond: (t1.id = t2.id)
-> Append (cost=0.00..3.18 rows=12 width=29)
(actual time=0.026..0.047 rows=12 loops=1)
-> Seq Scan on test11 t1 (cost=0.00..1.05 rows=5 width=29)
(actual time=0.025..0.028 rows=5 loops=1)
-> Seq Scan on test12 t1_1 (cost=0.00..1.04 rows=4 width=29)
(actual time=0.006..0.0 07 rows=4 loops=1)
-> Seq Scan on test13 t1_2 (cost=0.00..1.03 rows=3 width=29)
(actual time=0.005..0.0 06 rows=3 loops=1)
-> Hash (cost=3.13..3.13 rows=9 width=29)
(actual time=0.033..0.033 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Append (cost=0.00..3.13 rows=9 width=29)
(actual time=0.006..0.022 rows=9 loops=1)
-> Seq Scan on test21 t2
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.005. .0.008 rows=3 loops=1)
-> Seq Scan on test22 t2_1
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.00 4..0.005 rows=3 loops=1)
-> Seq Scan on test23 t2_2
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.00 3..0.004 rows=3 loops=1)
Planning Time: 0.921 ms
Execution Time: 0.261 ms
(14 rows)

=# set enable_partitionwise_join to false;
=# explain analyze select * from test1 t1 inner join test2 t2 using (id);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=3.25..6.56 rows=9 width=54)
(actual time=0.073..0.095 rows=3 loops=1)
Hash Cond: (t1.id = t2.id)
-> Append (cost=0.00..3.18 rows=12 width=29)
(actual time=0.022..0.041 rows=12 loops=1)
-> Seq Scan on test11 t1 (cost=0.00..1.05 rows=5 width=29)
(actual time=0.021..0.024 rows=5 loops=1)
-> Seq Scan on test12 t1_1 (cost=0.00..1.04 rows=4 width=29)
(actual time=0.006..0.0 07 rows=4 loops=1)
-> Seq Scan on test13 t1_2 (cost=0.00..1.03 rows=3 width=29)
(actual time=0.004..0.0 05 rows=3 loops=1)
-> Hash (cost=3.13..3.13 rows=9 width=29)
(actual time=0.031..0.031 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Append (cost=0.00..3.13 rows=9 width=29)
(actual time=0.006..0.021 rows=9 loops=1)
-> Seq Scan on test21 t2
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.005. .0.008 rows=3 loops=1)
-> Seq Scan on test22 t2_1
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.00 3..0.004 rows=3 loops=1)
-> Seq Scan on test23 t2_2
(cost=0.00..1.03 rows=3 width=29)
(actual time=0.00 3..0.004 rows=3 loops=1)
Planning Time: 1.154 ms
Execution Time: 0.201 ms
(14 rows)

My investigation shows that the merge function stops on the second iteration
because of this condition:

/*
* Multiple partitions from one relation map to one partition from the
* other relation. Partition-wise join does not handle this case right
* now, since it requires ganging multiple partitions together (into
* one RelOptInfo).
*/
merged_index = -1;

I'm confused, since there is only one-to-one mapping of partitions.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-15 18:55:57 Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian
Previous Message Tom Lane 2018-07-15 17:02:22 Re: why partition pruning doesn't work?