Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-15 17:36:33
Message-ID: CAJ3gD9eDj2efc4HixYmV1feXF_EWU97EY6zcsfvG0VpJHXHpYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 June 2017 at 15:40, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> While rebasing my patch for the below recent commit, I realized that a
> similar issue exists for the uptate-tuple-routing patch as well :
>
> commit 78a030a441966d91bc7e932ef84da39c3ea7d970
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Mon Jun 12 23:29:44 2017 -0400
>
> Fix confusion about number of subplans in partitioned INSERT setup.
>
> The above issue was about incorrectly using 'i' in
> mtstate->mt_plans[i] during handling WITH CHECK OPTIONS in
> ExecInitModifyTable(), where 'i' was actually meant to refer to the
> positions in mtstate->mt_num_partitions. Actually for INSERT, there is
> only a single plan element in mtstate->mt_plans[] array.
>
> Similarly, for update-tuple routing, we cannot use
> mtstate->mt_plans[i], because 'i' refers to position in
> mtstate->mt_partitions[] , whereas mtstate->mt_plans is not at all in
> order of mtstate->mt_partitions; in fact mt_plans has only the plans
> that are to be scanned on pruned partitions; so it can well be a small
> subset of total partitions.
>
> I am working on an updated patch to fix the above.

Attached patch v10 fixes the above. In the existing code, where it
builds WCO constraints for each leaf partition; with the patch, that
code now is applicable to row-movement-updates as well. So the
assertions in the code are now updated to allow the same. Secondly,
the mapping for each of the leaf partitions was constructed using the
root partition attributes. Now in the patch, the
mtstate->resultRelInfo[0] (i.e. the first resultRelInfo) is used as
reference. So effectively, map_partition_varattnos() now represents
not just parent-to-partition mapping, but rather, mapping between any
two partitions/partitioned_tables. It's done this way, so that we can
have a common WCO building code for inserts as well as updates. For
e.g. for inserts, the first (and only) WCO belongs to
node->nominalRelation so nominalRelation is used for
map_partition_varattnos(), whereas for updates, first WCO belongs to
the first resultRelInfo which is not same as nominalRelation. So in
the patch, in both cases, we use the first resultRelInfo and the WCO
of the first resultRelInfo for map_partition_varattnos().

Similar thing is done for Returning expressions.

---------

Another change in the patch is : for ExecInitQual() for WCO quals,
mtstate->ps is used as parent, rather than first plan. For updates,
first plan does not belong to the parent partition. In fact, I think
in all cases, we should use mtstate->ps as the parent.
mtstate->mt_plans[0] don't look like they should be considered parent
of these expressions. May be it does not matter to which parent we
link these quals, because there is no ReScan for ExecModifyTable().

Note that for RETURNING projection expressions, we do use mtstate->ps.

--------

There is another issue I discovered. The row-movement works fine if
the destination leaf partition has different attribute ordering than
the root : the existing insert-tuple-routing mapping handles that. But
if the source partition has different ordering w.r.t. the root, it has
a problem : there is no mapping in the opposite direction, i.e. from
the leaf to root. And we require that because the tuple of source leaf
partition needs to be converted to root partition tuple descriptor,
since ExecFindPartition() starts with root.

To fix this, I have introduced another mapping array
mtstate->mt_resultrel_maps[]. This corresponds to the
mtstate->resultRelInfo[]. We don't require per-leaf-partition mapping,
because the update result relations are pruned subset of the total
leaf partitions.

So in ExecInsert, before calling ExecFindPartition(), we need to
convert the leaf partition tuple to root using this reverse mapping.
Since we need to convert the tuple here, and again after
ExecFindPartition() for the found leaf partition, I have replaced the
common code by new function ConvertPartitionTupleSlot().

-------

Used a new flag is_partitionkey_update in ExecInitModifyTable(), which
can be re-used in subsequent sections , rather than again calling
IsPartitionKeyUpdate() function again.

-------

Some more test scenarios added that cover above changes. Basically
partitions that have different tuple descriptors than parents.

Attachment Content-Type Size
update-partition-key_v10.patch application/octet-stream 49.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2017-06-15 17:41:10 Re: Getting server crash on Windows when using ICU collation
Previous Message Bruce Momjian 2017-06-15 17:07:35 Re: Broken hint bits (freeze)