Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2019-11-22 13:08:15
Message-ID: CAPmGK15kXh76EnRn=B64u=+qLxZOKWROd4uMjMBnWcsZPdQ_mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 15, 2019 at 6:10 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Fri, Nov 15, 2019 at 2:21 PM amul sul <sulamul(at)gmail(dot)com> wrote:
> > Thank you Fujita san for the patch & the enhancements. I am fine with your
> > delta patch.
>
> OK, I'll merge the delta patch with the main one in the next version,
> if no objections from others.
>
> > I would like to share some thought for other code:
> >
> > File: partbounds.c:
> > 3298 static void
> > 3299 get_merged_range_bounds(int partnatts, FmgrInfo *partsupfuncs,
> > 3300 Oid *partcollations, JoinType jointype,
> > 3301 PartitionRangeBound *left_lb,
> > 3302 PartitionRangeBound *left_ub,
> > 3303 PartitionRangeBound *right_lb,
> > 3304 PartitionRangeBound *right_ub,
> > 3305 PartitionRangeBound *merged_lb,
> > 3306 PartitionRangeBound *merged_ub)
> >
> > Can we rename these argument as inner_* & outer_* like we having for the
> > functions, like 0003 patch?
>
> +1 (Actually, I too was thinking about that.)
>
> > File: partbounds.c:
> > 3322
> > 3323 case JOIN_INNER:
> > 3324 case JOIN_SEMI:
> > 3325 if (compare_range_bounds(partnatts, partsupfuncs, partcollations,
> > 3326 left_ub, right_ub) < 0)
> > 3327 *merged_ub = *left_ub;
> > 3328 else
> > 3329 *merged_ub = *right_ub;
> > 3330
> > 3331 if (compare_range_bounds(partnatts, partsupfuncs, partcollations,
> > 3332 left_lb, right_lb) > 0)
> > 3333 *merged_lb = *left_lb;
> > 3334 else
> > 3335 *merged_lb = *right_lb;
> > 3336 break;
> > 3337
> >
> > How about reusing ub_cmpval & lb_cmpval instead of calling
> > compare_range_bounds() inside get_merged_range_bounds(), like 0004 patch?
>
> Good idea! So, +1
>
> > Apart from this, I would like to propose 0005 cleanup patch where I have
> > rearranged function arguments & code to make sure the arguments & the code
> > related to lower bound should come first and then the upper bound.
>
> +1
>
> I'll merge these changes as well into the main patch, if no objections
> of others.

Done. I modified compare_range_partitions() as well to match its the
variable names with others. Attached is a new version of the patch.

I reviewed the rest of the partbounds.c changes. Here are my review comments:

* I don't think this analysis is correct:

+/*
+ * merge_null_partitions
+ *
+ * Merge NULL partitions, i.e. a partition that can hold NULL values for a lis\
t
+ * partitioned table, if any. Find the index of merged partition to which the
+ * NULL values would belong in the join result. If one joining relation has a
+ * NULL partition but not the other, try matching it with the default partitio\
n
+ * from the other relation since the default partition may have rows with NULL
+ * partition key. We can eliminate a NULL partition when it appears only on th\
e
+ * inner side of the join and the outer side doesn't have a default partition.
+ *
+ * When the equality operator used for join is strict, two NULL values will no\
t
+ * be considered as equal, and thus a NULL partition can be eliminated for an
+ * inner join. But we don't check the strictness operator here.
+ */

First of all, I think we can assume here that the equality operator is
*strict*, because 1) have_partkey_equi_join(), which is executed
before calling merge_null_partitions(), requires that the
corresponding clause is mergejoinable, and 2) currently, we assume
that mergejoinable operators are strict (see MJEvalOuterValues() and
MJEvalInnerValues()). So I don't think it's needed to try merging a
NULL partition on one side with the default partition on the other
side as above. (merge_null_partitions() tries merging NULL partitions
as well, but in some cases I don't think we need to do that, either.)
So I rewrote merge_null_partitions() completely. Another change is
that I eliminated the NULL partition of the joinrel in more cases.
Attached is a patch (v28-0002-modify-merge_null_partitions.patch) for
that created on top of the main patch. I might be missing something,
though.

Other changes in that patch:

* I fixed memory leaks in partition_list_bounds_merge() and
partition_range_bounds_merge().

* I noticed this in merge_default_partitions():

+ Assert(outer_has_default && inner_has_default);
+
+ *default_index = map_and_merge_partitions(outer_map,
+ inner_map,
+ outer_default,
+ inner_default,
+ next_index);
+ if (*default_index == -1)
+ return false;

I think the merging should be done successfully, because of 1) this in
process_outer_partition():

+ if (inner_has_default)
+ {
+ Assert(inner_default >= 0);
+
+ /*
+ * If the outer side has the default partition as well, we need to
+ * merge the default partitions (see merge_default_partitions()); give
+ * up on it.
+ */
+ if (outer_has_default)
+ return false;

and 2) this in process_inner_partition():

+ if (outer_has_default)
+ {
+ Assert(outer_default >= 0);
+
+ /*
+ * If the inner side has the default partition as well, we need to
+ * merge the default partitions (see merge_default_partitions()); give
+ * up on it.
+ */
+ if (inner_has_default)
+ return false;

So I removed the above if test (ie, *default_index == -1). I also
modified that function a bit further, including comments.

* I simplified process_outer_partition() and process_inner_partition()
a bit, changing the APIs to match that of map_and_merge_partitions().
Also, I think this in these functions is a bit ugly;

+ /* Don't add this index to the list of merged indexes. */
+ *merged_index = -1;

so I removed it, and modified the condition on whether or not we add
merged bounds to the lists in partition_list_bounds_merge() and
partition_range_bounds_merge(), instead.

That's it. Sorry for the delay.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v28-0001-Improve-partition-matching-for-partitionwise-join.patch application/octet-stream 310.9 KB
v28-0002-Modify-merge_null_partitions.patch application/octet-stream 42.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-11-22 13:22:04 Re: TAP tests aren't using the magic words for Windows file access
Previous Message Michael Paquier 2019-11-22 12:56:03 Re: base backup client as auxiliary backend process