Re: Partition-wise join for join between (declaratively) partitioned tables

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-04 04:34:37
Message-ID: cfc94b13-fd1a-fa0f-94e2-94790e8110a3@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/04 12:38, Etsuro Fujita wrote:
> On 2017/09/02 4:10, Ashutosh Bapat wrote:
>> This rebase mainly changes patch 0001, which translates partition
>> hierarchy into inheritance hierarchy creating AppendRelInfos and
>> RelOptInfos for partitioned partitions. Because of that, it's not
>> necessary to record the partitioned partitions in a
>> PartitionedChildRelInfos::child_rels. The only RTI that goes in there
>> is the RTI of child RTE which is same as the parent RTE except inh
>> flag. I tried removing that with a series of changes but it seems that
>> following code in ExecInitModifyTable() requires it.
>> 1897     /* The root table RT index is at the head of the
>> partitioned_rels list */
>> 1898     if (node->partitioned_rels)
>> 1899     {
>> 1900         Index       root_rti;
>> 1901         Oid         root_oid;
>> 1902
>> 1903         root_rti = linitial_int(node->partitioned_rels);
>> 1904         root_oid = getrelid(root_rti, estate->es_range_table);
>> 1905         rel = heap_open(root_oid, NoLock);  /* locked by InitPlan */
>> 1906     }
>> 1907     else
>> 1908         rel = mtstate->resultRelInfo->ri_RelationDesc;
>>
>> I don't know whether we could change this code not to use
>> PartitionedChildRelInfos::child_rels.

For a root partitioned tables, ModifyTable.partitioned_rels comes from
PartitionedChildRelInfo.child_rels recorded for the table by
expand_inherited_rtnentry(). In fact, the latter is copied verbatim to
ModifyTablePath (or AppendPath/MergeAppendPath) when creating the same.
The only point of keeping those RT indexes around in the ModifyTable node
is for the executor to be able to locate partitioned table RT entries and
lock them. Without them, the executor wouldn't know about those tables at
all, because there won't be subplans corresponding to partitioned tables
in the tree and hence their RT indexes won't appear in the
ModifyTable.resultRelations list. If your patch adds partitioned child
rel AppendRelInfos back for whatever reason, you should also make sure in
inheritance_planner() that their RT indexes don't end up the
resultRelations list. See this piece of code in inheritance_planner():

1351 /* Build list of sub-paths */
1352 subpaths = lappend(subpaths, subpath);
1353
1354 /* Build list of modified subroots, too */
1355 subroots = lappend(subroots, subroot);
1356
1357 /* Build list of target-relation RT indexes */
1358 resultRelations = lappend_int(resultRelations,
appinfo->child_relid);

Maybe it won't happen, because if this appinfo corresponds to a
partitioned child table, recursion would occur and we'll get to this piece
of code for only the leaf children.

By the way, if you want to get rid of PartitionedChildRelInfo, you can do
that as long as you find some other way of putting together the
partitioned_rels list to add into the ModifyTable (Append/MergeAppend)
node created for the root partitioned table. Currently,
PartitionedChildRelInfo (and the root->pcinfo_list) is the way for
expand_inherited_rtentry() to pass that information to the planner's
path-generating code. We may be able to generate that list when actually
creating the path using set_append_rel_pathlist() or
inheritance_planner(), without having created a PartitionedChildRelInfo
node beforehand.

> Though I haven't read the patch yet, I think the above code is useless.
> And I proposed a patch to clean it up before [1].  I'll add that patch to
> the next commitfest.

+1.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-04 04:46:17 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Previous Message Michael Paquier 2017-09-04 04:15:46 Re: [bug fix] Savepoint-related statements terminates connection