Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2020-04-01 17:12:01
Message-ID: CAG-ACPVkw+cLM0HNsZACJxq=M8JFn7cfH-eLLzBRZDkD9ZN+2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 26 Mar 2020 at 00:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> Hi,
>
> I've started reviewing the patch a couple days ago. I haven't done any
> extensive testing, but I do have a bunch of initial comments that I can
> share now.
>
> 1) I wonder if this needs to update src/backend/optimizer/README, which
> does have a section about partitionwise joins. It seems formulated in a
> way that that probably covers even this more advanced algorithm, but
> maybe it should mention handling of default partitions etc.?
>

Done. Please check the wording. It might need some word smithy.

>
> There certainly needs to be some description of the algorithm somewhere,
> either in a README or before a suitable function. It doesn't have to be
> particularly detailed, a rough outline of the matching would be enough,
> so that readers don't have to rebuild the knowledge from pieces
> scattered around various comments.
>

The algorithm for list and range partitioned tables is slightly different.
So, I have added separate prologue to each list_merge_bounds() and
range_merge_bounds(). Please check if that serves the purpose.

>
> 2) Do we need another GUC enabling this more complex algorithm? In PG11
> the partitionwise join is disabled by default, on the grounds that it's
> expensive and not worth it by default. How much more expensive is this?
> Maybe it makes sense to allow enabling only the "simple" approach?
>

We have reduced the complexity of merging bounds quite a bit so this
shouldn't be costly. Further more we handle the usual case of equal bounds
quickly using the merge flag so most of the cases should be fine. It's only
when two partitioned tables with same partition scheme are joined but do
not have merge-able bounds that this algorithm would not yield useful
result - but that would be rare in the field. enable_partitionwise_join =
false should take care of such scenarios easily. I am not in favour of
adding another GUC which we set to false by default and then take another
few releases to make it true by default.

>
> 3) This comment in try_partitionwise_join is now incorrect, because the
> condition may be true even for partitioned tables with (nparts == 0).
>
> /* Nothing to do, if the join relation is not partitioned. */
> if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
> return;
>
>
If part_scheme = NULL, npart should be 0, fixed that in
build_joinrel_partition_info(). If partscheme != NULL but bounds can not be
merged, nparts = 0. So this condition is correct. Encapsulated in a macro
IS_JOINREL_NOT_PARTITITIONED(). and added comments for the macro. Given
that the macro is used exactly at one place, it may not be necessary to
define it but it looks *nice*.

> Moreover, the condition used to be
>
> if (!IS_PARTITIONED_REL(joinrel))
> return;
>
> which is way more readable. I think it's net negative to replace these
> "nice" macros with clear meaning with complex conditions. If needed, we
> can invent new macros. There are many other places where the patch
> replaces macros with less readable conditions.
>
>
The only other place where we have replaced a *nice* macro is in
build_joinrel_partition_info(). But I think it's a legit replacement. I
have added a comment there.

>
> 4) I'm a bit puzzled how we could get here with non-partitioned rels?
>
> /*
> * We can not perform partitionwise join if either of the joining
> relations
> * is not partitioned.
> */
> if (!IS_PARTITIONED_REL(rel1) || !IS_PARTITIONED_REL(rel2))
> return;
>

See the comment I have added in build_joinrel_partition_info(). Not all
joining pairs for a given relation are partitioned.

>
> 5) I find the "merged" flag in RelOptInfo rather unclear, because it
> does not clearly indicate what was merged. Maybe something like
> partbounds_merged would be better?
>

Done.

>
> 6) The try_partitionwise_join function is getting a bit too long and
> harder to understand. The whole block in
>
> if (joinrel->nparts == -1)
> {
> ...
> }
>
> seems rather well isolated, so I propose to move it to a separate
> function responsible only for the merging. We can simply call it on the
> joinrel, and make it return right away if (joinrel->nparts == -1).
>

Looks like you have already taken care of this one in one of your patches.

>
> 7) I'd suggest not to reference exact functions in comments unless
> abolutely necessary, because it's harder to maintain and it does not
> really explain purpose of the struct/code. E.g. consider this:
>
> /* Per-partitioned-relation data for
> merge_list_bounds()/merge_range_bounds() */
> typedef struct PartitionMap
> { ... }
>
> Why does it matter where is the struct used? That's pretty trivial to
> find using 'git grep' or something. Instead the comment should explain
> the purpose of the struct.
>

Adjusted names and comments a bit.

--
Best Wishes,
Ashutosh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-01 17:27:56 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Peter Geoghegan 2020-04-01 17:03:37 Re: snapshot too old issues, first around wraparound and then more.