Re: Partition-wise join for join between (declaratively) partitioned tables

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-12 02:01:09
Message-ID: 17bb51b2-52bf-3a0b-d540-04051cf3e781@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/11 19:45, Ashutosh Bapat wrote:
> On Mon, Sep 11, 2017 at 12:16 PM, Amit Langote wrote:
>> IMHO, we should make it the responsibility of the future patch to set a
>> child PlanRowMark's prti to the direct parent's RT index, when we actually
>> know that it's needed for something. We clearly know today why we need to
>> pass the other objects like child RT entry, RT index, and Relation, so we
>> should limit this patch to pass only those objects to the recursive call.
>> That makes this patch a relatively easy to understand change.
>
> I think you are mixing two issues here 1. setting parent RTI in child
> PlanRowMark and 2. passing immediate parent's PlanRowMark to
> expand_single_inheritance_child().
>
> I have discussed 1 in my reply to Robert.
>
> About 2 you haven't given any particular comments to my reply. To me
> it looks like it's this patch that introduces the notion of
> multi-level expansion, so it's natural for this patch to pass
> PlanRowMark in cascaded fashion similar to other structures.

You patch does 2 to be able to do 1, doesn't it? That is, to be able to
set the child PlanRowMark's prti to the direct parent's RT index, you pass
the immediate parent's PlanRowMark to the recursive call of
expand_single_inheritance_child().

All I am trying to say is that this patch's mission is to expand
inheritance step-wise to be able to do certain things in the *planner*
that weren't possible before. The patch accomplishes that by creating
child AppendRelInfos such that its parent_relid field is set to the
immediate parent's RT index. It's quite clear why we're doing so. It's
not clear why we should do so for PlanRowMarks too. Maybe it's fine as
long as nothing breaks.

>> If we go with your patch, partitioned tables won't get locked, for
>> example, in case of the following query (p is a partitioned table):
>>
>> select 1 from p union all select 2 from p;
>>
>> That's because the RelOptInfos for the two instances of p in the above
>> query are RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL. They are children
>> of the Append corresponding to the UNION ALL subquery RTE. So,
>> partitioned_rels does not get set per your proposed code.
>

[...]

> So, all partitioned partitions are getting locked correctly. Am I
> missing something?

Will reply to this separately to your other email.

> Actually, the original problem that caused this discussion started
> with an assertion failure in get_partitioned_child_rels() as
> Assert(list_length(result) >= 1);
>
> This assertion fails if result is NIL when an intermediate partitioned
> table is passed. May be we should assert (result == NIL ||
> list_length(result) == 1) and allow that function to be called even
> for intermediate partitioned partitions for which the function will
> return NIL. That will leave the code in add_paths_to_append_rel()
> simple. Thoughts?

Yeah, I guess that could work. We'll just have to write comments to
describe why the Assert is written that way.

>> In create_lateral_join_info():
>>
>> + Assert(IS_SIMPLE_REL(brel));
>> + Assert(brte);
>>
>> The second Assert is either unnecessary or should be placed first.
>
> simple_rte_array[] may have some NULL entries. Second assert makes
> sure that we aren't dealing with a NULL entry. Any particular reason
> to reorder the asserts?

Sorry, I missed that the 2nd Assert has b"rte". I thought it's b"rel".

>> The following comment could be made a bit clearer.
>>
>> + * In the case of table inheritance, the parent RTE is directly
>> linked
>> + * to every child table via an AppendRelInfo. In the case of table
>> + * partitioning, the inheritance hierarchy is expanded one level at a
>> + * time rather than flattened. Therefore, an other member rel
>> that is
>> + * a partitioned table may have children of its own, and must
>> + * therefore be marked with the appropriate lateral info so that
>> those
>> + * children eventually get marked also.
>>
>> How about: In the case of partitioned table inheritance, the original
>> parent RTE is linked, via AppendRelInfo, only to its immediate partitions.
>> Partitions below the first level are accessible only via their immediate
>> parent's RelOptInfo, which would be of kind RELOPT_OTHER_MEMBER_REL, so
>> consider those as well.
>
> I don't see much difference between those two. We usually do not use
> macros in comments, so usually comments mention "other member" rel.
> Let's leave this for the committer to judge.

Sure.

>> In expand_inherited_rtentry(), the following comment fragment is obsolete,
>> because we *do* now create AppendRelInfo's for partitioned children:
>>
>> + /*
>> + * We keep a list of objects in root, each of which maps a
>> partitioned
>> + * parent RT index to the list of RT indexes of its partitioned child
>> + * tables which do not have AppendRelInfos associated with those.
>
> Good catch. I have reworded it as
> /*
> * We keep a list of objects in root, each of which maps a root
> * partitioned parent RT index to the list of RT indexes of descendant
> * partitioned child tables.
>
> Does that look good?

Looks fine.

>> By the way, when we call expand_single_inheritance_child() in the
>> non-partitioned inheritance case, we should pass NULL for childrte_p,
>> childRTindex_p, childrc_p, instead of declaring variables that won't be
>> used. Hence, expand_single_inheritance_child() should make those
>> arguments optional.
>
> That introduces an extra "if" condition, which is costlier than an
> assignment. We have used both the styles in the code. Previously, I
> have got comments otherwise. So, I am not sure.

OK. expand_single_inheritance_child's header comment does not mention the
new result fields. Maybe add a comment describing what their role is and
that they're not optional arguments.

> I will update the patches once we have some resolution about 1. prti
> in PlanRowMarks and 2. detection of root partitioned table in
> add_paths_to_append_rel().

OK.

About 2, I somewhat agree with your proposed solution above, which might
be simpler to explain in comments than the code I proposed.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-12 02:03:54 Re: Still another race condition in recovery TAP tests
Previous Message Michael Paquier 2017-09-12 01:55:47 Removing wal_keep_segments as default configuration in PostgresNode.pm