Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-11-10 03:42:07
Message-ID: CAJ3gD9d_RXqb3GU9MD2cy8ftjOTYwRT9LV4ZOVHG+5atqFmT-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 November 2017 at 12:40, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> ISTM, ModifyTableState now has one too
> many TupleConversionMap pointer arrays after the patch, creating the need
> to choose from in the first place. AIUI -
>
> * mt_perleaf_parentchild_maps:
>
> - each entry is a map to convert root parent's tuples to a given leaf
> partition's format
>
> - used to be called mt_partition_tupconv_maps and is needed when tuple-
> routing is in use; for both INSERT and UPDATE with tuple-routing
>
> - as many entries in the array as there are leaf partitions and stored
> in the partition bound order
>
> * mt_perleaf_childparent_maps:
>
> - each entry is a map to convert a leaf partition's tuples to the root
> parent's format
>
> - newly added by this patch and seems to be needed for UPDATE with
> tuple-routing for two needs: 1. tuple-routing should start with a
> tuple in root parent format whereas the tuple received is in leaf
> partition format when ExecInsert() called for update-tuple-routing (by
> ExecUpdate), 2. after tuple-routing, we must capture the tuple
> inserted into the partition in the transition tuplestore which accepts
> tuples in root parent's format
>
> - as many entries in the array as there are leaf partitions and stored
> in the partition bound order
>
> * mt_persubplan_childparent_maps:
>
> - each entry is a map to convert a child table's tuples to the root
> parent's format
>
> - used to be called mt_transition_tupconv_maps and needed for converting
> child tuples to the root parent's format when storing them in the
> transition tuplestore which accepts tuples in root parent's format
>
> - as many entries in the array as there are sub-plans in mt_plans and
> stored in either the partition bound order or unknown order (the
> latter in the regular inheritance case)

thanks for the detailed description. Yet that's correct.

>
> I think we could combine the last two into one. The only apparent reason
> for them to be separate seems to be that the subplan array might contain
> less entries than perleaf array and ExecInsert() has only enough
> information to calculate the offset of a map in the persubplan array.
> That is, resultRelInfo of leaf partition that ExecInsert starts with in
> the update-tuple-routing case comes from mtstate->resultRelInfo array
> which contains only mt_nplans entries. So, if we only have the array with
> entries for *all* partitions, it's hard to get the offset of the map to
> use in that array.
>
> I suggest we don't add a new map array and a significant amount of new
> code to initialize the same and to implement the logic to choose the
> correct array to get the map from. Instead, we could simply add an array
> of integers with mt_nplans entries. Each entry is an offset of a given
> sub-plan in the array containing entries of something for *all*
> partitions. Since, we are teaching ExecSetupPartitionTupleRouting() to
> reuse ResultRelInfos from mtstate->resultRelInfos, there is a suitable
> place to construct such array. Let's say the array is called
> mt_subplan_partition_offsets[]. Let ExecSetupPartitionTupleRouting() also
> initialize the parent-to-partition maps for *all* partitions, in the
> update-tuple-routing case. Then add a quick-return check in
> ExecSetupTransitionCaptureState() to see if the map has already been set
> by ExecSetupPartitionTupleRouting(). Since we're using the same map for
> two purposes, we could rename mt_transition_tupconv_maps to something that
> doesn't bind it to its use only for transition tuple capture.

I was trying hard to verify whether this is really going to simplify
the code. We are removing one array and adding one. In my approach,
the map structures are anyway shared, they are not duplicated. Because
I have separate arrays to access the tuple conversion map
partition-based or subplan-based, there is no need for extra logic to
get into the per-partition array. But on the other hand, we need not
do that many changes in ExecSetupTransitionCaptureState() that I have
done, although my patch hasn't resulted in more number of line in that
function; it has just changed the logic.

Also, each time we access the map, we need to know whether it is
per-plan or per-partition, according to a set of factors like whether
transition tables are there and whether tuple routing is there.

But I realized that one plus point of your approach is that it is
going to be extensible if we later need to have some more per-subplan
information that is already there in a partition-wise array. In that
case, we just need to re-use the int[] map; we don't have to create
two new separate arrays; just create one per-leaf array, and use the
map to get into one of its elements, given a per-subplan index.

So I went ahead and did the changes :

New mtstate maps :

TupleConversionMap **mt_parentchild_tupconv_maps;
/* Per partition map for tuple conversion from root to leaf */
TupleConversionMap **mt_childparent_tupconv_maps;
/* Per plan/partition map for tuple conversion from child to root */
int *mt_subplan_partition_offsets;
/* Stores position of update result rels in leaf partitions */

We need to know whether mt_childparent_tupconv_maps is per-plan or
per-partition. Each time this map is accessed, it is tedious to go
through conditions that determine whether that map is per-partition or
not. Here are the conditions :

For transition tables
per-leaf map needed : in presence of tuple routing (insert or
update, whichever).
per-plan map needed : in presence of simple update (i.e. routing
not involved)
For update tuple routing.
per-plan map needed : always

So instead, added a new bool mtstate->mt_is_tupconv_perpart field that
is set to true only while setting up transition tables and that too
only when tuple routing is to be done.

Since both transition tables and update tuple routing need a
child-parent map, extracted the code to build the map into a common
function ExecSetupChildParentMap(). (I think I could have done this
earlier also)

Each time we need to access this map, we not only have to use the
int[] maps, we also need to first check if it's a per-leaf map. So put
this logic in tupconv_map_for_subplan() and used this everywhere we
need the map.

Attached is v23 patch that has just the above changes (and also
rebased on hash-partitioning changes, like update.sql). I am still
doing some sanity testing on this, although regression passes.

I am yet to respond to the other review comments; will do that with a v24 patch.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
update-partition-key_v23.patch application/octet-stream 111.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-11-10 04:18:48 Re: [POC] Faster processing at Gather node
Previous Message Kyotaro HORIGUCHI 2017-11-10 03:30:00 Re: path toward faster partition pruning