Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-08-03 16:08:37
Message-ID: CAFjFpRc0W9vGOJd+jFhnzLsvOr-m2jJLRx60hdMZUtKD9R-oZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 3, 2017 at 2:10 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 31, 2017 at 7:59 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> Adding AppendRelInfos to root->append_rel_list as and when they are
>> created would keep parent AppendRelInfos before those of children. But
>> that function throws away the AppendRelInfo it created when their are
>> no real children i.e. in partitioned table's case when has no leaf
>> partitions. So, we can't do that. Hence, I chose to change the API to
>> return the list of AppendRelInfos when the given RTE has real
>> children.
>
> So, IIUC, the case you're concerned about is when you have a hierarchy
> of only partitioned tables, with no plain tables. For example, if B
> is a partitioned table and a partition of A, and that's all there is,
> A will recurse to B and B will return NIL.
>
> Is it necessary to get rid of the extra AppendRelInfos, or are they
> harmless like the duplicate RTE and PlanRowMark nodes?
>

Actually there are two sides to this:

If there are no leaf partitions, without the patch two things happen
1. rte->inh is cleared and 2 no appinfo is added to the
root->append_rel_list, even though harmless RTE and PlanRowMark nodes
are created. The first avoids treating the relation as the inheritance
parent and thus avoids creating any child relations and paths, saving
a lot of work. Ultimately set_rel_size() marks such a relation as
dummy
352 else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
353 {
354 /*
355 * A partitioned table without leaf
partitions is marked
356 * as a dummy rel.
357 */
358 set_dummy_rel_pathlist(rel);
359 }

Since root->append_rel_list is traversed for every inheritance parent,
not adding needless AppendRelInfos improves performance and saves
memory, (FWIW or consider a case where there are thousands of
partitioned partitions without any leaf partition.).

My initial thought was to keep both these properties intact. But then
removing such AppendRelInfos would have a problem when such a table is
on the inner side of the join as described in [1]. So I wrote the
patch not to do either of those things when there are partitioned
partitions without leaf partitions. So, it looks like you are correct,
we could just go ahead and add those AppendRelInfos directly to
root->append_rel_list.

> /*
> * If all the children were temp tables or a partitioned parent did not
> * have any leaf partitions, pretend it's a non-inheritance situation; we
> * don't need Append node in that case. The duplicate RTE we added for
> * the parent table is harmless, so we don't bother to get rid of it;
> * ditto for the useless PlanRowMark node.
> */
> if (!need_append)
> {
> /* Clear flag before returning */
> rte->inh = false;
> return;
> }
>
> If we do need to get rid of the extra AppendRelInfos, maybe a less
> invasive solution would be to change the if (!need_append) case to do
> root->append_rel_list = list_truncate(root->append_rel_list,
> original_append_rel_length).

We might require this for non-partitioned tables. I will try to
implement it this way in the next set of patches.

>
>> The code avoids creating AppendRelInfos for a child which represents
>> the parent in its role as a simple member of inheritance set.
>
> OK, I suggest we rewrite the whole comment like this: "We need an
> AppendRelInfo if paths will be built for the child RTE. If
> childrte->inh is true, then we'll always need to generate append paths
> for it. If childrte->inh is false, we must scan it if it's not a
> partitioned table; but if it is a partitioned table, then it never has
> any data of its own and need not be scanned. It does, however, need
> to be locked, so note the OID for inclusion in the
> PartitionedChildRelInfo we're going to build."

Done.

>
> It looks like you also need to update the header comment for
> AppendRelInfo itself, in nodes/relation.h.

Done. Thanks for pointing it out.

>
> + * PlannerInfo for every child is obtained by translating relevant members
>
> Insert "The" at the start of the sentence.

Done.

>
> - subroot->parse = (Query *)
> - adjust_appendrel_attrs(root,
> - (Node *) parse,
> - appinfo);
> + subroot->parse = (Query *) adjust_appendrel_attrs(parent_root,
> + (Node *)
> parent_parse,
> + 1, &appinfo);
>
> I suggest that you don't remove the line break after the cast.

This is part of 0001 patch, fixed there.

>
> + * If the child is further partitioned, remember it as a parent. Since
> + * partitioned tables do not have any data, we don't need to create
> + * plan for it. We just need its PlannerInfo set up to be used while
> + * planning its children.
>
> Most of this comment is in the singular, but the first half of the
> second sentence is plural. Should be "Since a partitioned table does
> not have any data...". I might replace the last sentence by "We do,
> however, need to remember the PlannerInfo for use when planning its
> children."

Done.

>
> +-- Check UPDATE with *multi-level partitioned* inherited target
>
> Asterisks seem like overkill.

Done.

This style was copied from an existing comment in that file.
-- Check UPDATE with *partitioned* inherited target

>
> Since expand_inherited_rtentry() and set_append_rel_size() can now
> recurse down to as many levels as there are levels in the inheritance
> hierarchy, they should probably have a check_stack_depth() check.

Done. Even without this patch set_append_rel_size() could recurse down
many levels of inheritance hierarchy (created by set operation
queries) through
set_append_rel_size()->set_rel_size()->set_append_rel_size(). And so
would set_rel_size(). But now it's more prone to that problem.

I will provide updated patches after taking care of your comments
about 0005 and 0006.

[1] https://www.postgresql.org/message-id/CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2017-08-03 16:08:48 Re: reload-through-the-top-parent switch the partition table
Previous Message Robert Haas 2017-08-03 16:06:17 Re: map_partition_varattnos() and whole-row vars