Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-14 08:18:01
Message-ID: 975b7629-ff74-74df-a355-23c2e5f193a4@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/03/13 19:30, Amit Langote wrote:
>> Here is the updated patch.
>>
>> Since this patch proposes to avoid creating scan nodes for non-leaf tables
>> in a partition tree, they won't be referenced anywhere in the resulting
>> plan tree. So the executor will not lock those tables in the
>> select/update/delete cases. Insert is different since we lock all tables
>> in the partition tree when setting up tuple-routing in
>> ExecInitModifyTable. Not taking executor locks on the tables means that
>> the cached plans that should be invalidated upon adding/removing a
>> partition somewhere in the partition tree won't be.
>>
>> First I thought that we could remember just the root table RT index using
>> a new Index field partitionRoot in Append, MergeAppend, and ModifyTable
>> nodes and use it to locate and lock the root table during executor node
>> initialization. But soon realized that that's not enough, because it
>> ignores the fact that adding/removing partitions at lower levels does not
>> require taking a lock on the root table; only the immediate parent. So
>> any cached select/update/delete plans won't be invalidated when a new
>> lower-level partition is added/removed, because the immediate parent would
>> not have been added to the query's range table and hence the
>> PlannedStmt.relationOids list. Remember that the latter is used by
>> plancache.c to remember the relations that a given cached plan depends on
>> remaining unchanged. So the patch now adds a list member called
>> partitioned_rels to Append, MergeAppend, and ModifyTable nodes and stores
>> the RT indexes of all the non-leaf tables in a partition tree with root
>> table RT index at the head (note that these RT indexes are of the RTEs
>> added by expand_inherited_rtenrty; also see below). Since the
>> partitioned_rels list is constructed when building paths and must be
>> propagated to the plan nodes, the same field is also present in the
>> corresponding Path nodes. ExecInit* routines for the aforementioned nodes
>> now locate and lock the non-leaf tables using the RT indexes in
>> partitioned_rels. Leaf tables are locked, as before, either by InitPlan
>> (update/delete result relations case) or by ExecInitAppend or
>> ExecInitMergeAppend when initializing the appendplans/mergeplans (select
>> case).
>>
>> The previous proposal was for expand_inherited_rtentry to not create RT
>> entries and AppendRelInfo's for the non-leaf tables, but I think that
>> doesn't work, as I tried to explain above. We need RTEs because that
>> seems to be the only way currently for informing the executor of the
>> non-leaf tables. We need AppendRelInfo's to store the RT indexes of those
>> RTEs for the latter planning steps to collect them in partitioned_rels
>> mentioned above. So with the latest patch, we do create the RT entry and
>> AppendRelInfo for non-leaf tables. AppendRelInfo created in this case is
>> a minimal one; only parent_relid and child_relid are valid. To make the
>> latter planning steps ignore these minimal AppendRelInfo's, every
>> AppendRelInfo is now marked with child_relkind. Only
>> set_append_rel_pathlist() and inheritance_planner() process them to
>> collect the child_relid into the partitioned_rels list to be stored in
>> AppendPath/MergeAppendPath and ModifyTablePath, respectively.
>
> Sorry, forgot to attach the patches themselves. Attached this time.

I made some changes to 0001:

* No need to create RelOptInfo's for partitioned RTEs added by
expand_inherited_rtentry

* Instead of storing relkind of child tables in AppendRelInfo's, simply
store if they are partitioned; IOW, change child_relkind to
child_is_partitioned

Thanks,
Amit

Attachment Content-Type Size
0001-Avoid-creating-scan-nodes-for-partitioned-tables.patch text/x-diff 54.2 KB
0002-Do-not-allocate-storage-for-partitioned-tables.patch text/x-diff 3.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-14 08:20:53 Re: exposing wait events for non-backends (was: Tracking wait event for latches)
Previous Message tushar 2017-03-14 07:31:47 Re: increasing the default WAL segment size