Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-08-11 05:14:58
Message-ID: CAJ3gD9e+C7tTPTLe2J+Y4AWcQi-t2PJASDPaurfhr2amxPjsMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 August 2017 at 22:28, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>
>> Below are the TODOS at this point :
>>
>> Do something about two separate mapping tables for Transition tables
>> and update tuple-routing.
> On 1 July 2017 at 03:15, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Would make sense to have a set of functions with names like
>> GetConvertor{From,To}{Subplan,Leaf}(mtstate, index) which build arrays
>> m_convertors_{from,to}_by_{subplan,leaf} the first time they need
>> them?
>
> This was discussed here : [2]. I think even if we have them built when
> needed, still in presence of both tuple routing and transition tables,
> we do need separate arrays. So I think rather than dynamic arrays, we
> can have static arrays but their elements will point to a shared
> TupleConversionMap structure whenever possible.
> As already in the patch, in case of insert/update tuple routing, there
> is a per-leaf partition mt_transition_tupconv_maps array for
> transition tables, and a separate per-subplan arry mt_resultrel_maps
> for update tuple routing. *But*, what I am proposing is: for the
> mt_transition_tupconv_maps[] element for which the leaf partition also
> exists as a per-subplan result, that array element and the
> mt_resultrel_maps[] element will point to the same TupleConversionMap
> structure.
>
> This is quite similar to how we are re-using the per-subplan
> resultrels for the per-leaf result rels. We will re-use the
> per-subplan TupleConversionMap for the per-leaf
> mt_transition_tupconv_maps[] elements.
>
> Not yet implemented this.

The attached patch has the above needed changes. Now we have following
map arrays in ModifyTableState. The earlier naming was confusing so I
renamed them.
mt_perleaf_parentchild_maps : To be used for converting insert/update
routed tuples from root to the destination leaf partition.
mt_perleaf_childparent_maps : To be used for transition tables for
converting back the tuples from leaf partition to root.
mt_persubplan_childparent_maps : To be used by both transition tables
and update-row movement for their own different purpose for UPDATEs.

I also had to add another partition slot mt_rootpartition_tuple_slot
alongside mt_partition_tuple_slot. For update-row-movement, in
ExecInsert(), we used to have a common slot for root partition's tuple
as well as leaf partition tuple. So the former tuple was a transient
tuple. But mtstate->mt_transition_capture->tcs_original_insert_tuple
requires the tuple to be valid, so we could not pass a transient
tuple. Hence another partition slot.

-------

But in the first place, while testing transition tables behaviour with
update row movement, I found out that transition tables OLD TABLE AND
NEW TABLE don't get populated with the rows that are moved to another
partition. This is because the operation is ExecDelete() and
ExecInsert(), which don't run the transition-related triggers for
updates. Even though transition-table-triggers are statement-level,
the AR ROW trigger-related functions like ExecARUpdateTriggers() do
get run for each row, so that the tables get populated; and they skip
the usual row-level trigger stuff. For update-row-movement, we need to
teach ExecARUpdateTriggers() to run the transition-related processing
for the DELETE+INESRT operation as well. But since delete and insert
happen on different tables, we cannot call ExecARUpdateTriggers() at a
single place. We need to call it once after ExecDelete() for loading
the OLD row, and then after ExecInsert() for loading the NEW row.
Also, currently ExecARUpdateTriggers() does not allow NULL old tuple
or new tuple, but we need to allow it for the above transition table
processing.

The attached patch has the above needed changes.

>
>> Use getASTriggerResultRelInfo() for attrno mapping, rather than first
>> resultrel, for generating child WCO/RETURNING expression.
>>
>
> Regarding generating child WithCheckOption and Returning expressions
> using those of the root result relation, ModifyTablePath and
> ModifyTable should have new fields rootReturningList (and
> rootWithCheckOptions) which would be derived from
> root->parse->returningList in inheritance_planner(). But then, similar
> to per-subplan returningList, rootReturningList would have to pass
> through set_plan_refs()=>set_returning_clause_references() which
> requires the subplan targetlist to be passed. Because of this, for
> rootReturningList, we require a subplan for root partition, which is
> not there currently; we have subpans only for child rels. That means
> we would have to create such plan only for the sake of generating
> rootReturningList.
>
> The other option is to do the way the patch is currently doing in the
> executor by using the returningList of the first per-subplan result
> rel to generate the other child returningList (and WithCheckOption).
> This is working by applying map_partition_varattnos() to the first
> returningList. But now that we realized that we have to specially
> handle whole-row vars, map_partition_varattnos() would need some
> changes to convert whole row vars differently for
> child-rel-to-child-rel mapping. For childrel-to-childrel conversion,
> the whole-row var is already wrapped by ConvertRowtypeExpr, but we
> need to change its Var->vartype to the new child vartype.
>
> I think the second option looks easier, but I am open to suggestions,
> and I am myself still checking the first one.

I have done the changes using the second option above. In the attached
patch, the same map_partition_varattnos() is called for child-to-child
mapping. But in such case, the source child partition already has
ConvertRowtypeExpr node, so another ConvertRowtypeExpr node is not
added; just the containing var node is updated with the new composite
type. In the regression test, I have included different types like
numeric, int, text for the partition key columns, so as to test the
same.

>> More test scenarios in regression tests.
>> Need to check/test whether we are correctly applying insert policies
>> (ant not update) while inserting a routed tuple.
>
> Yet to do above two.

This is still to do.

Attachment Content-Type Size
update-partition-key_v15.patch application/octet-stream 122.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Hernández Tortosa 2017-08-11 06:50:16 Re: SCRAM protocol documentation
Previous Message Maksim Milyutin 2017-08-11 04:15:57 Re: Proposal: Local indexes for partitioned table