Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-04 12:32:14
Message-ID: CAFjFpRc9J+Dtw-tT6EW3uzsFiCvOHJj2g_PQeDLMvW1i9FVyDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 4, 2017 at 10:04 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2017/09/04 12:38, Etsuro Fujita wrote:
>> On 2017/09/02 4:10, Ashutosh Bapat wrote:
>>> This rebase mainly changes patch 0001, which translates partition
>>> hierarchy into inheritance hierarchy creating AppendRelInfos and
>>> RelOptInfos for partitioned partitions. Because of that, it's not
>>> necessary to record the partitioned partitions in a
>>> PartitionedChildRelInfos::child_rels. The only RTI that goes in there
>>> is the RTI of child RTE which is same as the parent RTE except inh
>>> flag. I tried removing that with a series of changes but it seems that
>>> following code in ExecInitModifyTable() requires it.
>>> 1897 /* The root table RT index is at the head of the
>>> partitioned_rels list */
>>> 1898 if (node->partitioned_rels)
>>> 1899 {
>>> 1900 Index root_rti;
>>> 1901 Oid root_oid;
>>> 1902
>>> 1903 root_rti = linitial_int(node->partitioned_rels);
>>> 1904 root_oid = getrelid(root_rti, estate->es_range_table);
>>> 1905 rel = heap_open(root_oid, NoLock); /* locked by InitPlan */
>>> 1906 }
>>> 1907 else
>>> 1908 rel = mtstate->resultRelInfo->ri_RelationDesc;
>>>
>>> I don't know whether we could change this code not to use
>>> PartitionedChildRelInfos::child_rels.
>
> For a root partitioned tables, ModifyTable.partitioned_rels comes from
> PartitionedChildRelInfo.child_rels recorded for the table by
> expand_inherited_rtnentry(). In fact, the latter is copied verbatim to
> ModifyTablePath (or AppendPath/MergeAppendPath) when creating the same.
> The only point of keeping those RT indexes around in the ModifyTable node
> is for the executor to be able to locate partitioned table RT entries and
> lock them. Without them, the executor wouldn't know about those tables at
> all, because there won't be subplans corresponding to partitioned tables
> in the tree and hence their RT indexes won't appear in the
> ModifyTable.resultRelations list. If your patch adds partitioned child
> rel AppendRelInfos back for whatever reason, you should also make sure in
> inheritance_planner() that their RT indexes don't end up the
> resultRelations list. See this piece of code in inheritance_planner():
>
> 1351 /* Build list of sub-paths */
> 1352 subpaths = lappend(subpaths, subpath);
> 1353
> 1354 /* Build list of modified subroots, too */
> 1355 subroots = lappend(subroots, subroot);
> 1356
> 1357 /* Build list of target-relation RT indexes */
> 1358 resultRelations = lappend_int(resultRelations,
> appinfo->child_relid);
>
> Maybe it won't happen, because if this appinfo corresponds to a
> partitioned child table, recursion would occur and we'll get to this piece
> of code for only the leaf children.

You are right. We don't execute above lines for partitioned partitions.

>
> By the way, if you want to get rid of PartitionedChildRelInfo, you can do
> that as long as you find some other way of putting together the
> partitioned_rels list to add into the ModifyTable (Append/MergeAppend)
> node created for the root partitioned table. Currently,
> PartitionedChildRelInfo (and the root->pcinfo_list) is the way for
> expand_inherited_rtentry() to pass that information to the planner's
> path-generating code. We may be able to generate that list when actually
> creating the path using set_append_rel_pathlist() or
> inheritance_planner(), without having created a PartitionedChildRelInfo
> node beforehand.

AFAIU, the list contained RTIs of the relations, which didnt' have
corresponding AppendRelInfos to lock those relations. Now that we
create AppendRelInfos even for partitioned partitions, I don't think
we need the list to take care of the locks. Is there any other reason
why we maintain that list (apart from the trigger case I have raised
and Fujita-san says that the list is not required in that case as
well.)

>
>> Though I haven't read the patch yet, I think the above code is useless.
>> And I proposed a patch to clean it up before [1]. I'll add that patch to
>> the next commitfest.
>
> +1.
+1. Will Fujita-san's patch also handle getting rid of partitioned_rels list?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-09-04 12:36:31 Re: Release Note changes
Previous Message Amit Kapila 2017-09-04 12:31:32 Re: WIP: long transactions on hot standby feedback replica / proof of concept