Re: Partition-wise join for join between (declaratively) partitioned tables

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-30 10:32:21
Message-ID: 47209ad1-c4f0-23c3-223f-331878a750c0@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/03/30 18:35, Ashutosh Bapat wrote:
>> On Wed, Mar 29, 2017 at 8:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I don't think 0011 is likely to be acceptable in current form. I
>> can't imagine that we just went to the trouble of getting rid of
>> AppendRelInfos for child partitioned rels only to turn around and put
>> them back again. If you just need the parent-child mappings, you can
>> get that from the PartitionedChildRelInfo list.
>>
>
> Please refer to my earlier mails on this subject [1], [2]. For
> multi-level partition-wise join, we need RelOptInfo of a partitioned
> table to contain RelOptInfo of its immediate partitions. I have not
> seen any counter arguments not to create RelOptInfos for intermediate
> partitioned tables. We create child RelOptInfos only for entries in
> root->append_rel_list i.e. only for those relations which have an
> AppendRelInfo. Since we are not creating AppendRelInfos for
> partitioned partitions, we do not create RelOptInfos for those. So, to
> me it looks like we have to either have AppendRelInfos for partitioned
> partitions or create RelOptInfos by traversing some other list like
> PartitionedChildRelInfo list. It looks odd to walk
> root->append_rel_list as well as this new list for creating
> RelOptInfos. But for a moment, we assume that we have to walk this
> other list. But then that other list is also lossy. It stores only the
> topmost parent of any of the partitioned partitions and not the
> immediate parent as required to add RelOptInfos of immediate children
> to the RelOptInfo of a parent.

So, because we want to create an Append path for each partitioned table in
a tree separately, we'll need RelOptInfo for each one, which in turn
requires an AppendRelInfo. Note that we do that only for those
partitioned child RTEs that have inh set to true, so that all the later
stages will treat it as the parent rel to create an Append path for.
There would still be partitioned child RTEs with inh set to false for
which, just like before, no AppendRelInfos and RelOptInfos are created;
they get added as the only member of partitioned_rels in the
PartitionedChildRelInfo of each partitioned table. Finally, when the
Append path for the root parent is created, its subpaths list will contain
paths of leaf partitions of all levels and its partitioned_rels list
should contain the RT indexes of partitioned tables of all levels.

If we have the following partition tree:

A
/ | \
B C D
/ \
E F

The following RTEs will be created, in that order. RTEs with inh=true are
shown with suffix _i. RTEs that get an AppendRelInfo (& a RelOptInfo) are
shown with suffix _a.

A_i_a
A
B_a
C_i_a
C
E_a
F_a
D_a

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2017-03-30 10:37:26 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Amit Kapila 2017-03-30 10:21:53 Re: Supporting huge pages on Windows