Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-13 10:30:21
Message-ID: a4e2be66-67c6-c8c0-080b-1af22a14b566@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/03/13 19:24, Amit Langote wrote:
> On 2017/03/10 17:57, Amit Langote wrote:
>> On 2017/03/08 22:36, Robert Haas wrote:
>>> On Wed, Mar 8, 2017 at 5:36 AM, Amit Langote wrote:
>>>>> - rel = mtstate->resultRelInfo->ri_RelationDesc;
>>>>> + nominalRTE = rt_fetch(node->nominalRelation, estate->es_range_table);
>>>>> + nominalRel = heap_open(nominalRTE->relid, NoLock);
>>>>>
>>>>> No lock?
>>>>
>>>> Hmm, I think I missed that a partitioned parent table would not be locked
>>>> by the *executor* at all after applying this patch. As of now, InitPlan()
>>>> takes care of locking all the result relations in the
>>>> PlannedStmt.resultRelations list. This patch removes partitioned
>>>> parent(s) from appearing in this list. But that makes me wonder - aren't
>>>> the locks taken by expand_inherited_rtentry() kept long enough? Why does
>>>> InitPlan need to acquire them again? I see this comment in
>>>> expand_inherited_rtentry:
>>>
>>> Parse-time locks, plan-time locks, and execution-time locks are all
>>> separate. See the header comments for AcquireRewriteLocks in
>>> rewriteHandler.c; think about a view, where the parsed query has been
>>> stored and is later rewritten and planned. Similarly, a plan can be
>>> reused even after the transaction that created that plan has
>>> committed; see AcquireExecutorLocks in plancache.c.
>>
>> Thanks for those references.
>>
>> I took a step back here and thought a bit more about the implications this
>> patch. It occurred to me that the complete absence of partitioned table
>> RT entries in the plan-tree has more consequences than I originally
>> imagined. I will post an updated patch by Monday latest.
>
> Here is the updated patch.
>
> Since this patch proposes to avoid creating scan nodes for non-leaf tables
> in a partition tree, they won't be referenced anywhere in the resulting
> plan tree. So the executor will not lock those tables in the
> select/update/delete cases. Insert is different since we lock all tables
> in the partition tree when setting up tuple-routing in
> ExecInitModifyTable. Not taking executor locks on the tables means that
> the cached plans that should be invalidated upon adding/removing a
> partition somewhere in the partition tree won't be.
>
> First I thought that we could remember just the root table RT index using
> a new Index field partitionRoot in Append, MergeAppend, and ModifyTable
> nodes and use it to locate and lock the root table during executor node
> initialization. But soon realized that that's not enough, because it
> ignores the fact that adding/removing partitions at lower levels does not
> require taking a lock on the root table; only the immediate parent. So
> any cached select/update/delete plans won't be invalidated when a new
> lower-level partition is added/removed, because the immediate parent would
> not have been added to the query's range table and hence the
> PlannedStmt.relationOids list. Remember that the latter is used by
> plancache.c to remember the relations that a given cached plan depends on
> remaining unchanged. So the patch now adds a list member called
> partitioned_rels to Append, MergeAppend, and ModifyTable nodes and stores
> the RT indexes of all the non-leaf tables in a partition tree with root
> table RT index at the head (note that these RT indexes are of the RTEs
> added by expand_inherited_rtenrty; also see below). Since the
> partitioned_rels list is constructed when building paths and must be
> propagated to the plan nodes, the same field is also present in the
> corresponding Path nodes. ExecInit* routines for the aforementioned nodes
> now locate and lock the non-leaf tables using the RT indexes in
> partitioned_rels. Leaf tables are locked, as before, either by InitPlan
> (update/delete result relations case) or by ExecInitAppend or
> ExecInitMergeAppend when initializing the appendplans/mergeplans (select
> case).
>
> The previous proposal was for expand_inherited_rtentry to not create RT
> entries and AppendRelInfo's for the non-leaf tables, but I think that
> doesn't work, as I tried to explain above. We need RTEs because that
> seems to be the only way currently for informing the executor of the
> non-leaf tables. We need AppendRelInfo's to store the RT indexes of those
> RTEs for the latter planning steps to collect them in partitioned_rels
> mentioned above. So with the latest patch, we do create the RT entry and
> AppendRelInfo for non-leaf tables. AppendRelInfo created in this case is
> a minimal one; only parent_relid and child_relid are valid. To make the
> latter planning steps ignore these minimal AppendRelInfo's, every
> AppendRelInfo is now marked with child_relkind. Only
> set_append_rel_pathlist() and inheritance_planner() process them to
> collect the child_relid into the partitioned_rels list to be stored in
> AppendPath/MergeAppendPath and ModifyTablePath, respectively.

Sorry, forgot to attach the patches themselves. Attached this time.

Thanks,
Amit

Attachment Content-Type Size
0001-Avoid-creating-scan-nodes-for-partitioned-tables.patch text/x-diff 51.5 KB
0002-Do-not-allocate-storage-for-partitioned-tables.patch text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2017-03-13 10:48:34 Re: IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
Previous Message Amit Langote 2017-03-13 10:24:32 Re: Partitioned tables and relfilenode