Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2018-02-07 04:51:58
Message-ID: CAFjFpRcA=t99o1uHVC=J03KtPzs7H36HYwST8yg6kh5wCO-V2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a new patchset with following changes

1. Rebased on the latest head taking care of partition bound
comparison function changes
2. Refactored the code to avoid duplication.
3. There's an extensive test (provided by Rajkumar) set added, which
is not meant to be committed. That testset has testcases which crash
or reveal a bug. I will fix those crashes and add corresponding
testcases to partition_join.sql.

TODO
1. FIx crashes/bugs in the testcases.

On Sun, Dec 3, 2017 at 4:53 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Fri, Oct 13, 2017 at 7:59 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> On Thu, Oct 12, 2017 at 9:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Oct 11, 2017 at 7:08 AM, Ashutosh Bapat
>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>> Here's updated patch set based on the basic partition-wise join
>>>> committed. The patchset applies on top of the patch to optimize the
>>>> case of dummy partitioned tables [1].
>>>>
>>>> Right now, the advanced partition matching algorithm bails out when
>>>> either of the joining relations has a default partition.
>>>
>>> So is that something you are going to fix?
>>>
>>
>> Yes, if time permits. I had left the patch unattended while basic
>> partition-wise join was getting committed. Now that it's committed, I
>> rebased it. It still has TODOs and some work is required to improve
>> it. But for the patch to be really complete, we have to deal with the
>> problem of missing partitions described before. I am fine
>> collaborating if someone else wants to pick it up.
>>
>
> Here's patchset which support advanced partition matching for
> partition bounds with default partition. The patchset is rebased on
> the latest head.
>
> When a list value is present in one of the joining relations and not
> the other, and the other relation has default partition, match (join)
> the partition containing that list value with the default partition,
> since the default partition may contain rows with that list value. If
> the default partition happens to be on the outer side of the join, the
> resulting join partition acts as a default partition as it will
> contain all the values from the default partition. If the partition
> containing the list value happens to be on the outer side of the join,
> the resulting join partition is associated with the list value, since
> no other partition key value from the default partition makes it to
> the join result.
>
> When a range is present (completely or partly) in one of the joining
> relations and not the other, and the other relation has default
> partition, match (join) the partition corresponding to that range with
> the default partition. If the default partition happens to be on the
> outer side of the join, the resulting join partition acts as a default
> partition as it will contain all the values from the default
> partition. If the non-partition corresponding to the range happens to
> be on the outer side of the join, the resulting join partition is
> associated with that range, since partition key values from the
> default partition outside that range won't make it to the join result.
>
> If both the relations have default partition, match (join) the default
> partition with each other and deem the resulting join partition as
> default partition. If one of the relations has default partition but
> not the other, and the default partition happens to be on the outer
> side of the join, all its rows will make it to the join. Such a
> default partition may get joined to a non-default partition from the
> inner side, if inner side has a range missing in the outer side.
>
> If any of the above causes multiple partitions from one side to match
> with one or more partitions on the other side, we won't use
> partition-wise join as discussed in the first mail of this thread.
>
> I have tested the patches for two-way join, but haven't added any test
> involving default partitions to the patch itself. It needs to be
> tested for N-way join as well. So, for now I have kept the two patches
> supporting the default partition in case of range and list resp.
> separate. Also, some of the code duplication in partition matching
> functions can be avoided using macros. I will merge those patches into
> the main patch and add macros once they are tested appropriately.
>
> For hash partitioned table, we haven't implemented the advanced
> partition matching, since it would be rare that somebody has hash
> partitioned tables with holes (even if they are allowed).
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_adv_dp_join_patches_v3.tar.gz application/x-gzip 149.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-07 05:14:42 Re: Add more information_schema columns
Previous Message Alvaro Herrera 2018-02-07 04:28:34 Re: [HACKERS] datetime.h defines like PM conflict with external libraries