Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-10-04 13:51:40
Message-ID: CAJ3gD9cazfppe7-wwUbabPcQ4_0SubkiPFD1+0r5_DkVNWo5yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30 September 2017 at 01:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> If I understand correctly, the reason for changing mt_partitions from
>>>> ResultRelInfo * to ResultRelInfo ** is that, currently, all of the
>>>> ResultRelInfos for a partitioning hierarchy are allocated as a single
>>>> chunk, but we can't do that and also reuse the ResultRelInfos created
>>>> during InitPlan. I suggest that we do this as a preparatory patch.
>>>
>>> Ok, will prepare a separate patch. Do you mean to include in that
>>> patch the changes I did in ExecSetupPartitionTupleRouting() that
>>> re-use the ResultRelInfo structures of per-subplan update result rels
>>> ?
>>
>> Above changes are in attached
>> 0001-Re-use-UPDATE-result-rels-created-in-InitPlan.patch.
>
> No, not all of those changes. Just the adjustments to make
> ModifyTableState's mt_partitions be of type ResultRelInfo ** rather
> than ResultRelInfo *, and anything closely related to that. Not, for
> example, the num_update_rri stuff.

Ok. Attached is the patch modified to have changes only to handle
array of ResultRelInfo * instead of array of ResultRelInfo.

-------

On 4 October 2017 at 01:08, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 3, 2017 at 8:16 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> While writing this down, I observed that after multi-level partition
>> tree expansion was introduced, the child table expressions are not
>> converted directly from the root. Instead, they are converted from
>> their immediate parent. So there is a chain of conversions : to leaf
>> from its parent, to that parent from its parent, and so on from the
>> root. Effectively, during the first conversion, there are that many
>> ConvertRowtypeExpr nodes one above the other already present in the
>> UPDATE result rel expressions. But my patch handles the optimization
>> only for the leaf partition conversions.
>
> Maybe adjust_appendrel_attrs() should have a similar provision for
> avoiding extra ConvertRowTypeExpr nodes?

Yeah, I think we should be able to do that. Will check.

------

On 19 September 2017 at 13:15, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 18 September 2017 at 20:45, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> Please find few more comments.
>>
>> + * in which they appear in the PartitionDesc. Also, extract the
>> + * partition key columns of the root partitioned table. Those of the
>> + * child partitions would be collected during recursive expansion.
>> */
>> + pull_child_partition_columns(&all_part_cols, oldrelation, oldrelation);
>> expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
>> lockmode, &root->append_rel_list,
>> + &all_part_cols,
>>
>> pcinfo->all_part_cols is only used in case of update, I think we can
>> call pull_child_partition_columns
>> only if rte has updateCols?
>>
>> @@ -2029,6 +2034,7 @@ typedef struct PartitionedChildRelInfo
>>
>> Index parent_relid;
>> List *child_rels;
>> + Bitmapset *all_part_cols;
>> } PartitionedChildRelInfo;
>>
>> I might be missing something, but do we really need to store
>> all_part_cols inside the
>> PartitionedChildRelInfo, can't we call pull_child_partition_columns
>> directly inside
>> inheritance_planner whenever we realize that RTE has some updateCols
>> and we want to
>> check the overlap?
>
> One thing we will have to do extra is : Open and close the
> partitioned rels again. The idea was that we collect the bitmap
> *while* we are already expanding through the tree and the rel is open.
> Will check if this is feasible.

While giving more thought on this suggestion of Dilip's, I found out
that pull_child_partition_columns() is getting called with child_rel
and its immediate parent. That means, it maps the child rel attributes
to its immediate parent. If that immediate parent is not the root
partrel, then the conversion is not sufficient. We need to map child
rel attnos to root partrel attnos. So for a partition tree with 3 or
more levels, with the bottom partitioned rel having different att
ordering than the root, this will not work.

Before the commit that enabled recursive multi-level partition tree
expansion, pull_child_partition_columns() was always getting called
with child_rel and root rel. So this issue crept up when I rebased
over this commit, overlooking the fact that parent rel is the
immediate parent, not the root parent.

Anyways, I think Dilip's suggestion makes sense : we can do the
finding-all-part-cols work separately in inheritance_planner() using
the partitioned_rels handle. Re-opening the partitioned tables should
be cheap, because they have already been opened earlier, so they are
available in relcache. So did this as he suggested using new function
get_all_partition_cols(). While doing that, I have ensured that we use
the root rel to map all the child rel attnos. So the above issue is
fixed now.

Also added test scenarios that test the above issue. Namely, made the
partition tree 3 level, and added some specific scenarios where it
used to wrongly error out without trying to move the tuple, because it
determined partition-key is not updated.

---------

Though we re-use the update result rels, the WCO and Returning
expressions were not getting re-used from those update result rels.
This check was missing :
@@ -2059,7 +2380,7 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
for (i = 0; i < mtstate->mt_num_partitions; i++)
{
Relation partrel;
List *rlist;

resultRelInfo = mtstate->mt_partitions[i];
+
+ /*
+ * If we are referring to a resultRelInfo from one of the update
+ * result rels, that result rel would already have a returningList
+ * built.
+ */
+ if (resultRelInfo->ri_projectReturning)
+ continue;
+
partrel = resultRelInfo->ri_RelationDesc;

Added this check in the patch.

----------

On 22 September 2017 at 16:13, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 21 September 2017 at 19:52, amul sul <sulamul(at)gmail(dot)com> wrote:
>>
>> 86 - (event == TRIGGER_EVENT_UPDATE &&
>> !trigdesc->trig_update_after_row))
>> 87 + (event == TRIGGER_EVENT_UPDATE &&
>> !trigdesc->trig_update_after_row) ||
>> 88 + (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL || newtup
>> == NULL)))
>> 89 return;
>> 90 }
>>
>>
>> Either of oldtup or newtup will be valid at a time & vice versa. Can we
>> improve
>> this check accordingly?
>>
>> For e.g.:
>> (event == TRIGGER_EVENT_UPDATE && )(HeapTupleIsValid(oldtup) ^
>> ItemPointerIsValid(newtup)))))
>
>Ok, I will be doing this as below :
>- (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL || newtup == NULL)))
>+ (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL))))

Have done this in the attached patch.

--------

Attached are these patches :

Preparatory patches :
0001-Prepare-for-re-using-UPDATE-result-rels-during-tuple.patch
0002-Prevent-a-redundant-ConvertRowtypeExpr-node.patch
Main patch :
update-partition-key_v20.patch

Thanks
-Amit Khandekar

Attachment Content-Type Size
0001-Prepare-for-re-using-UPDATE-result-rels-during-tuple.patch application/octet-stream 12.9 KB
0002-Prevent-a-redundant-ConvertRowtypeExpr-node.patch application/octet-stream 3.6 KB
update-partition-key_v20.patch application/octet-stream 114.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-04 14:04:11 Re: 64-bit queryId?
Previous Message Michael Paquier 2017-10-04 13:49:06 Re: 64-bit queryId?