Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-30 09:35:59
Message-ID: CAFjFpRd6Kzx6Xn=7vdwwnh6rEw2VEgo--iPdhV+Fb7bHfPzsbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Wed, Mar 29, 2017 at 8:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't think 0011 is likely to be acceptable in current form. I
> can't imagine that we just went to the trouble of getting rid of
> AppendRelInfos for child partitioned rels only to turn around and put
> them back again. If you just need the parent-child mappings, you can
> get that from the PartitionedChildRelInfo list.
>

Please refer to my earlier mails on this subject [1], [2]. For
multi-level partition-wise join, we need RelOptInfo of a partitioned
table to contain RelOptInfo of its immediate partitions. I have not
seen any counter arguments not to create RelOptInfos for intermediate
partitioned tables. We create child RelOptInfos only for entries in
root->append_rel_list i.e. only for those relations which have an
AppendRelInfo. Since we are not creating AppendRelInfos for
partitioned partitions, we do not create RelOptInfos for those. So, to
me it looks like we have to either have AppendRelInfos for partitioned
partitions or create RelOptInfos by traversing some other list like
PartitionedChildRelInfo list. It looks odd to walk
root->append_rel_list as well as this new list for creating
RelOptInfos. But for a moment, we assume that we have to walk this
other list. But then that other list is also lossy. It stores only the
topmost parent of any of the partitioned partitions and not the
immediate parent as required to add RelOptInfos of immediate children
to the RelOptInfo of a parent.

Coming back to the point of PartitionedChildRelInfo list as a way to
maintain parent - child relationship. All the code assumes that the
parent-child relationship is stored in AppendRelInfo linked as
root->append_rel_list and walks that list to find children of a given
parent of parent/s of a given child. We will have to modify all those
places to traverse two lists instead of one. Some of those even return
AppendRelInfo structure, and now they some times return an
AppendRelInfo and sometimes PartitionedChildRelInfo. That looks ugly.

Consider a case where P has partitions p1 and p2, which in turn have
partitions p11, p12 and p21, p22 resp. Another partitioned table Q has
partitions q1, q2. q1 is further partitioned into q11, q12 but q2 is
not partitioned. The partitioning scheme of P and Q matches. Also,
partitioning scheme of p1 and q1 matches. So, a partition-wise join
between P and Q would look like P J Q = append (p11 J q11, p12 J q12,
p2 J q2), p2 J q2 being append(p21, p22) J q2. When constructing the
restrictlist (and other clauses) for p2 J q2 we need to translate the
restrictlist applicable for P J Q. This translation requires
AppendRelInfo of p2 which does not exist today. We can not use
PartitionedChildRelInfo because it doesn't have information about how
to translate Vars of P to those of p2.

I don't see a way to avoid creating AppendRelInfos for partitioned partitions.

[1] https://www.postgresql.org/message-id/CAFjFpRefs5ZMnxQ2vP9v5zOtWtNPuiMYc01sb1SWjCOB1CT%3DuQ%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-03-30 09:40:12 Re: increasing the default WAL segment size
Previous Message Amit Langote 2017-03-30 09:24:16 Re: Partitioned tables and relfilenode