Re: [HACKERS] UPDATE of partition key

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] UPDATE of partition key
Date: 2017-11-22 10:11:19
Message-ID: 032a04c8-6eca-ba4a-f36f-7cfad908ef13@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Amit.

Looking at the latest v25 patch.

On 2017/11/16 23:50, Amit Khandekar wrote:
> Below has the responses for both Amit's and David's comments, starting
> with Amit's ....
> On 2 November 2017 at 12:40, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2017/10/24 0:15, Amit Khandekar wrote:
>>> On 16 October 2017 at 08:28, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>
>>>> + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup
>>>> == NULL))))
>>>>
>>>> Is there some reason why a bitwise operator is used here?
>>>
>>> That exact condition means that the function is called for transition
>>> capture for updated rows being moved to another partition. For this
>>> scenario, either the oldtup or the newtup is NULL. I wanted to exactly
>>> capture that condition there. I think the bitwise operator is more
>>> user-friendly in emphasizing the point that it is indeed an "either a
>>> or b, not both" condition.
>>
>> I see. In that case, since this patch adds the new condition, a note
>> about it in the comment just above would be good, because the situation
>> you describe here seems to arise only during update-tuple-routing, IIUC.
>
> Done. Please check.

Looks fine.

>> + * 'update_rri' has the UPDATE per-subplan result rels. These are re-used
>> + * instead of allocating new ones while generating the array of all leaf
>> + * partition result rels.
>>
>> Instead of:
>>
>> "These are re-used instead of allocating new ones while generating the
>> array of all leaf partition result rels."
>>
>> how about:
>>
>> "There is no need to allocate a new ResultRellInfo entry for leaf
>> partitions for which one already exists in this array"
>
> Ok. I have made it like this :
>
> + * 'update_rri' contains the UPDATE per-subplan result rels. For the
> output param
> + * 'partitions', we don't allocate new ResultRelInfo objects for
> + * leaf partitions for which they are already available
> in 'update_rri'.

Sure.

>>> It looks like the interface does not much simplify, and above that, we
>>> have more number of lines in that function. Also, the caller anyway
>>> has to be aware whether the map_index is the index into the leaf
>>> partitions or the update subplans. So it is not like the caller does
>>> not have to be aware about whether the mapping should be
>>> mt_persubplan_childparent_maps or mt_perleaf_parentchild_maps.
>>
>> Hmm, I think we should try to make it so that the caller doesn't have to
>> be aware of that. And by caller I guess you mean ExecInsert(), which
>> should not be a place, IMHO, where to try to introduce a lot of new logic
>> specific to update tuple routing.
>
> I think, for ExecInsert() since we have already given the job of
> routing the tuple from root partitioned table to a partition, it makes
> sense to give the function the additional job of routing the tuple
> from any partition to any partition. ExecInsert() can be looked at as
> doing this job : "insert a tuple into the right partition; the
> original tuple can belong to any partition"

Yeah, that's one way of looking at that. But I think ExecInsert() as it
is today thinks it's got a *new* tuple and that's it. I think the newly
introduced code in it to find out that it is not so (that the tuple
actually comes from some other partition), that it's really the
update-turned-into-delete-plus-insert, and then switch to the root
partitioned table's ResultRelInfo, etc. really belongs outside of it.
Maybe in its caller, which is ExecUpdate(). I mean why not add this code
to the block in ExecUpdate() that handles update-row-movement.

Just before calling ExecInsert() to do the re-routing seems like a good
place to do all that. For example, try the attached incremental patch
that applies on top of yours. I can see after applying it that diffs to
ExecInsert() are now just some refactoring ones and there are no
significant additions making it look like supporting update-row-movement
required substantial changes to how ExecInsert() itself works.

> After doing the changes for the int[] array map in the previous patch
> version, I still feel that ConvertPartitionTupleSlot() should be
> retained. We save some repeated lines of code saved.

OK.

>> You may be right, but I see for WithCheckOptions initialization
>> specifically that the non-tuple-routing code passes the actual sub-plan
>> when initializing the WCO for a given result rel.
>
> Yes that's true. The problem with WithCheckOptions for newly allocated
> partition result rels is : we can't use a subplan for the parent
> parameter because there is no subplan for it. But I will still think
> on it a bit more (TODO).

Alright.

>>> I think you are suggesting we do it like how it's done in
>>> is_partition_attr(). Can you please let me know other places we do
>>> this same way ? I couldn't find.
>>
>> OK, not as many as I thought there would be, but there are following
>> beside is_partition_attrs():
>>
>> partition.c: get_range_nulltest()
>> partition.c: get_qual_for_range()
>> relcache.c: RelationBuildPartitionKey()
>>
>
> Ok, I think I will first address Robert's suggestion of re-using
> is_partition_attrs() for pull_child_partition_columns(). If I do that,
> this discussion won't be applicable, so I am deferring this one.
> (TODO)

Sure, no problem.

Thanks,
Amit

Attachment Content-Type Size
v25-delta-pass-root-from-ExecUpdate.patch text/plain 3.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2017-11-22 10:22:30 Re: [HACKERS] [PATCH] Incremental sort
Previous Message Amit Langote 2017-11-22 09:21:27 Re: default range partition and constraint exclusion