Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2017-12-03 11:23:36
Message-ID: CAFjFpReJhFSoy6DqH0ipFSHd=sLNEkSzAtz4VWCaS-w2jZL=uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 13, 2017 at 7:59 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Thu, Oct 12, 2017 at 9:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Oct 11, 2017 at 7:08 AM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> Here's updated patch set based on the basic partition-wise join
>>> committed. The patchset applies on top of the patch to optimize the
>>> case of dummy partitioned tables [1].
>>>
>>> Right now, the advanced partition matching algorithm bails out when
>>> either of the joining relations has a default partition.
>>
>> So is that something you are going to fix?
>>
>
> Yes, if time permits. I had left the patch unattended while basic
> partition-wise join was getting committed. Now that it's committed, I
> rebased it. It still has TODOs and some work is required to improve
> it. But for the patch to be really complete, we have to deal with the
> problem of missing partitions described before. I am fine
> collaborating if someone else wants to pick it up.
>

Here's patchset which support advanced partition matching for
partition bounds with default partition. The patchset is rebased on
the latest head.

When a list value is present in one of the joining relations and not
the other, and the other relation has default partition, match (join)
the partition containing that list value with the default partition,
since the default partition may contain rows with that list value. If
the default partition happens to be on the outer side of the join, the
resulting join partition acts as a default partition as it will
contain all the values from the default partition. If the partition
containing the list value happens to be on the outer side of the join,
the resulting join partition is associated with the list value, since
no other partition key value from the default partition makes it to
the join result.

When a range is present (completely or partly) in one of the joining
relations and not the other, and the other relation has default
partition, match (join) the partition corresponding to that range with
the default partition. If the default partition happens to be on the
outer side of the join, the resulting join partition acts as a default
partition as it will contain all the values from the default
partition. If the non-partition corresponding to the range happens to
be on the outer side of the join, the resulting join partition is
associated with that range, since partition key values from the
default partition outside that range won't make it to the join result.

If both the relations have default partition, match (join) the default
partition with each other and deem the resulting join partition as
default partition. If one of the relations has default partition but
not the other, and the default partition happens to be on the outer
side of the join, all its rows will make it to the join. Such a
default partition may get joined to a non-default partition from the
inner side, if inner side has a range missing in the outer side.

If any of the above causes multiple partitions from one side to match
with one or more partitions on the other side, we won't use
partition-wise join as discussed in the first mail of this thread.

I have tested the patches for two-way join, but haven't added any test
involving default partitions to the patch itself. It needs to be
tested for N-way join as well. So, for now I have kept the two patches
supporting the default partition in case of range and list resp.
separate. Also, some of the code duplication in partition matching
functions can be avoided using macros. I will merge those patches into
the main patch and add macros once they are tested appropriately.

For hash partitioned table, we haven't implemented the advanced
partition matching, since it would be rare that somebody has hash
partitioned tables with holes (even if they are allowed).

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_adv_dp_join_patches_v2.tar.gz application/x-gzip 42.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2017-12-03 11:42:12 Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
Previous Message Fabien COELHO 2017-12-03 09:14:57 Re: [HACKERS] pow support for pgbench