Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: amul sul <sulamul(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2019-09-02 05:08:02
Message-ID: CAAJ_b94tJTix3kR8uBjin-ruJ-7ojn-gAWJQRicbLqAttQTe1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Fujita San,

Please find my comments inline below:

On Wed, Aug 28, 2019 at 3:52 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
wrote:

> On Fri, Aug 16, 2019 at 10:25 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
> wrote:
> >

[... skipped ..]

>
> About the attached:
>
> * The attached patch modified try_partitionwise_join() so that we call
> partition_bounds_equal() then partition_bounds_merge() (if necessary)
> to create the partition bounds for the join rel. We don't support for
> merging the partition bounds for the hash-partitioning case, so this
> makes code to handle the hash-partitioning case in
> partition_bounds_merge() completely unnecessary, so I removed that
> entirely.
>

Yes, that make sense.

On thinking further, a thought hits to me why we can't join two hash
partitioned
table which has the same modulus and partition key specification, but some
of
the partitions are missing from either partitioned table.

For e.g. here is a smaller case where foo has two partitions and bar has
only one.

CREATE TABLE foo(a int) PARTITION BY HASH(a);
CREATE TABLE foo_p0 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder
0);
CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder
1);

CREATE TABLE bar(a int) PARTITION BY HASH(a); <-- missing partitions
for REMAINDER 1
CREATE TABLE bar_p0 PARTITION OF bar FOR VALUES WITH(modulus 2, remainder
0);

Explain:
postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a;
QUERY PLAN

---------------------------------------------------------------------------------
Merge Join (cost=590.35..1578.47 rows=65025 width=8)
Merge Cond: (p2.a = p1.a)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: p2.a
-> Seq Scan on bar_p0 p2 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: p1.a
-> Append (cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on foo_p0 p1 (cost=0.00..35.50 rows=2550
width=4)
-> Seq Scan on foo_p1 p1_1 (cost=0.00..35.50 rows=2550
width=4)
(10 rows)

The partitions-wise join will be performed only if we fill the partition
hole that
exists for the joining table i.e. adding partitions to bar table.

postgres=# CREATE TABLE bar_p1 PARTITION OF bar FOR VALUES WITH(modulus 2,
remainder 1);
CREATE TABLE
postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a;
QUERY PLAN

---------------------------------------------------------------------------------
Append (cost=359.57..2045.11 rows=65024 width=8)
-> Merge Join (cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (p1.a = p2.a)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: p1.a
-> Seq Scan on foo_p0 p1 (cost=0.00..35.50 rows=2550
width=4)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: p2.a
-> Seq Scan on bar_p0 p2 (cost=0.00..35.50 rows=2550
width=4)
-> Merge Join (cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (p1_1.a = p2_1.a)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: p1_1.a
-> Seq Scan on foo_p1 p1_1 (cost=0.00..35.50 rows=2550
width=4)
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: p2_1.a
-> Seq Scan on bar_p1 p2_1 (cost=0.00..35.50 rows=2550
width=4)
(17 rows)

It would have been nice if we could support this case, as we do allow
partitions
hole for the other partition scheme, but there wouldn't be much objection if
we don't want to add this support for now since there will be a lesser
chance
that hash partitioned table has the hole, IMO.

> * I removed this assertion in partition_bounds_merge(), because I
> think this is covered by two assertions above this.
>
> + Assert((*outer_parts == NIL || *inner_parts != NIL) &&
> + (*inner_parts == NIL || *outer_parts != NIL));
>
> * (I forgot to mention this in a previous email, but) I removed this
> bit of generate_matching_part_pairs(), because we already have the
> same check in try_partitionwise_join(), so this bit would be redundant
> IIUC.
>

Looks good.

>
> * I added more comments.
>

Thanks.

> If there are no objections, I'll merge the attached with the base patch in
> [1].
>

The proposed enhancement in the patch is too good and the patch is pretty
much
reasonable to merge into the main patch.

Here are the few cosmetic fixes for this path I think is needed. Feel free
to
ignore if some of them do not make sense or too obvious.

Note: left side number represents code line number of the patch.

118 + }
119 +
120 + /*
121 + * Try to create the partition bounds along with join pairs.
122 + */
123 + if (boundinfo == NULL)
124 + {

Can we add this block as else part of previous if condition checking equal
partitions bound?

133 + Assert(list_length(parts1) == list_length(parts2));
134 + if (boundinfo == NULL)
135 + {
136 + joinrel->nparts = 0;
137 + return;
138 + }
139 + nparts = list_length(parts1);

Can we move the assert check below at line#139 in the patch i.e. after if
block.

And the question is do we need to do that assert check since
partition_bounds_merge()
does that just before returning, thoughts?

204 + RelOptInfo *child_rel1 = merged ? (RelOptInfo *) lfirst(lcr1) :
rel1->part_rels[cnt_parts];
205 + RelOptInfo *child_rel2 = merged ? (RelOptInfo *) lfirst(lcr2) :
rel2->part_rels[cnt_parts];

How about using lfirst_node instead of lfirst & casting explicitly?

Also, these lines crossing 80 column length which I think we need to fix.
How about
doing the assignment as follow, just after the variable declaration part:

if (merged)
{
child_rel1 = lfirst_node(lcr1, RelOptInfo);
child_rel2 = lfirst_node(lcr2, RelOptInfo);
lcr1 = lnext(parts1, lcr1);
lcr2 = lnext(parts2, lcr2);
}
else
{
child_rel1 = rel1->part_rels[cnt_parts];
child_rel2 = rel2->part_rels[cnt_parts]
}

rel1_empty = (child_rel1 == NULL || IS_DUMMY_REL(child_rel1));
rel2_empty = (child_rel2 == NULL || IS_DUMMY_REL(child_rel2));

266 + * get_matching_part_pairs
267 + * Generate join pairs of partitions for the two inputs for the
join rel

Can we rewrite this description as " Generate join pairs of partitions for
the
join rel from the two inputs." OR "Generate join pairs of partitions for
the
two inputs"

310 + Assert(bms_num_members(child_relids1) ==
bms_num_members(rel1->relids));
311 + /*

335 + Assert(bms_num_members(child_relids2) ==
bms_num_members(rel2->relids));
336 + /*

Need newline after assert statements.

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-09-02 05:11:42 Re: Bug in GiST paring heap comparator
Previous Message Fabien COELHO 2019-09-02 04:58:42 Re: moonjelly vs tsearch