Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-04 11:38:17
Message-ID: CAFjFpRe-8P5pzfET4YZRH0Vawd0_o2TK4_zo+AjZ04mfCB3O0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 1, 2017 at 6:05 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
>> I originally thought to provide it along-with the changes to
>> expand_inherited_rtentry(), but that thread is taking longer. Jeevan
>> Chalke needs rebased patches for his work on aggregate pushdown and
>> Thomas might need them for further review. So, here they are.
>
> Since I have related patch in the current commitfest
> (https://commitfest.postgresql.org/14/1247/), I spent some time reviewing your
> patch:
>
> * generate_partition_wise_join_paths()
>
> Right parenthesis is missing in the prologue.

Thanks for pointing that out. Fixed.

>
>
> * get_partitioned_child_rels_for_join()
>
> I think the Assert() statement is easier to understand inside the loop, see
> the assert.diff attachment.

The assert at the end of function also checks that we have got
child_rels lists for all the parents passed in. That is not checked by
your version. Furthermore, we would checked that each child_rels has
at least one element while buildings paths for base relations.
Checking the same again for joins doesn't add any value.

>
>
> * have_partkey_equi_join()
>
> As the function handles generic join, this comment doesn't seem to me
> relevant:
>
> /*
> * The equi-join between partition keys is strict if equi-join between
> * at least one partition key is using a strict operator. See
> * explanation about outer join reordering identity 3 in
> * optimizer/README
> */
> strict_op = op_strict(opexpr->opno);

What in that comment is not exactly relevant?

>
> And I think the function can return true even if strict_op is false for all
> the operators evaluated in the loop.

I think it does that. Do you have a case where it doesn't?

>
>
> * match_expr_to_partition_keys()
>
> I'm not sure this comment is clear enough:
>
> /*
> * If it's a strict equi-join a NULL partition key on one side will
> * not join a NULL partition key on the other side. So, rows with NULL
> * partition key from a partition on one side can not join with those
> * from a non-matching partition on the other side. So, search the
> * nullable partition keys as well.
> */
> if (!strict_op)
> continue;
>
> My understanding of the problem of NULL values generated by outer join is:
> these NULL values --- if evaluated by non-strict expression --- can make row
> of N-th partition on one side of the join match row(s) of *other than* N-th
> partition(s) on the other side. Thus the nullable input expressions may only
> be evaluated by strict operators. I think it'd be clearer if you stressed that
> (undesired) *match* of partition keys can be a problem, rather than mismatch

Sorry, I am not able to understand this. To me it looks like my
wording conveys what you are saying.

>
> If you insist on your wording, then I think you should at least move the
> comment below to the part that only deals with strict operators.

Done.

>
>
> * There are several places where lfirst_node() macro should be used. For
> example
>
> rel = lfirst_node(RelOptInfo, lc);
>
> instead of
>
> rel = (RelOptInfo *) lfirst(lc);

Thanks for that.

>
>
> * map_and_merge_partitions()
>
> Besides a few changes proposed in map_and_merge_partitions.diff (a few of them
> to suppress compiler warnings) I think that this part needs more thought:
>
> {
> Assert(mergemap1[index1] != mergemap2[index2] &&
> mergemap1[index1] >= 0 && mergemap2[index2] >= 0);
>
> /*
> * Both the partitions map to different merged partitions. This
> * means that multiple partitions from one relation matches to one
> * partition from the other relation. Partition-wise join does not
> * handle this case right now, since it requires ganging multiple
> * partitions together (into one RelOptInfo).
> */
> merged_index = -1;
> }
>
> I could hit this path with the following test:
>
> CREATE TABLE a(i int) PARTITION BY LIST(i);
> CREATE TABLE a_0 PARTITION OF a FOR VALUES IN (0, 2);
> CREATE TABLE b(j int) PARTITION BY LIST(j);
> CREATE TABLE b_0 PARTITION OF b FOR VALUES IN (1, 2);
>
> SET enable_partition_wise_join TO on;
>
> SELECT *
> FROM a
> FULL JOIN
> b ON i = j;
>
> I don't think there's a reason not to join a_0 partition to b_0, is there?

With the latest patchset I am seeing that partition-wise join is used
in this case. I have started a new thread [1] for advanced partition
matching patches. Please post review comments about the last two
patches on that thread.

[1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com

Attached patchset with above comments addressed.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v28.tar.gz application/x-gzip 159.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-04 11:39:58 Re: Release Note changes
Previous Message Sokolov Yura 2017-09-04 11:20:52 Re: Walsender timeouts and large transactions