Quick Links

Re: [HACKERS] UPDATE of partition key

From:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject:	Re: [HACKERS] UPDATE of partition key
Date:	2018-01-05 12:12:08
Message-ID:	CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 5 January 2018 at 03:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 3, 2018 at 6:29 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> ExecSetupPartitionTupleRouting() now returns PartitionTupleRouting *.
>>
>> Did this change in v3 version of
>> 0001-Encapsulate-partition-related-info-in-a-structure.patch
>
> I'll have to come back to some of the other open issues, but 0001 and
> 0005 look good to me now, so I pushed those as a single commit after
> fixing a few things that pgindent didn't like. I also think 0002 and
> 0003 look basically good, so I pushed those two as a single commit
> also. But the comment changes in 0003 didn't seem extensive enough to
> me so I made a few more changes there along the way.

Thanks. Attached is a rebased update-partition-key_v34.patch, which
also has the changes as per David Rowley's review comments as
explained below.

The above patch is to be applied over the last remaining preparatory
patch, now named (and attached) :
0001-Refactor-CheckConstraint-related-code.patch

On 3 January 2018 at 19:22, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> I've not finished looking at the regression tests yet, but here are a
> few things, some may have been changed in v33, I've not looked yet.
> Also apologies in advance if anything seems nitpicky.
No worries. In fact, it's good to do this right now, otherwise it's
difficult to notice and fix at later point of time. Thanks.

>
> 1. "by INSERT" -> "by an INSERT" in:
>
> from the original partition followed by <command>INSERT</command> into the
>
> 2. "and INSERT" -> "and an INSERT" in:
>
> a <command>DELETE</command> and <command>INSERT</command>. As far as
>
> 3. "due partition-key change" -> "due to the partition-key being changed" in:
>
> * capture is happening for UPDATEd rows being moved to another partition due
> * partition-key change, then this function is called once when the row is
>
> 4. "inserted to another" -> "inserted into another" in:
>
> * deleted (to capture OLD row), and once when the row is inserted to another
>
> 5. "for UPDATE event" -> "for an UPDATE event" (singular), or -> "for
> UPDATE events" (plural)
>
> * oldtup and newtup are non-NULL. But for UPDATE event fired for
>
> I'm unsure if you need singular or plural. It perhaps does not matter.
>
> 6. "for row" -> "for a row" in:
>
> * movement, oldtup is NULL when the event is for row being inserted,
>
> Likewise in:
>
> * whereas newtup is NULL when the event is for row being deleted.

Done all of the above.

>
> 7. In the following fragment the code does not do what the comment says:
>
> /*
> * If transition tables are the only reason we're here, return. As
> * mentioned above, we can also be here during update tuple routing in
> * presence of transition tables, in which case this function is called
> * separately for oldtup and newtup, so either can be NULL, not both.
> */
> if (trigdesc == NULL ||
> (event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) ||
> (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
> (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) ||
> (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL))))
> return;
>
> With the comment; "so either can be NULL, not both.", I'd expect a
> boolean OR not an XOR.
>
> maybe the comment is better written as:
>
> "so we expect exactly one of them to be non-NULL"

Ok. Made it : "so we expect exactly one of them to be NULL"

>
> (I know you've been discussing with Robert, so I've not checked v33 to
> see if this still exists)

Yes, it's not yet concluded.

>
> 8. I'm struggling to make sense of this:
>
> /*
> * Save a tuple conversion map to convert a tuple routed to this
> * partition from the parent's type to the partition's.
> */
>
> Maybe it's better to write this as:
>
> /*
> * Generate a tuple conversion map to convert tuples of the parent's
> * type into the partition's type.
> */

This is existing code; not from my patch.

>
> 9. insert should be capitalised here and should be prefixed with "an":
>
> /*
> * Verify result relation is a valid target for insert operation. Even
> * for updates, we are doing this for tuple-routing, so again, we need
> * to check the validity for insert operation.
> */
> CheckValidResultRel(leaf_part_rri, CMD_INSERT);
>
> Maybe it's better to write:
>
> /*
> * Verify result relation is a valid target for an INSERT. An UPDATE of
> * a partition-key becomes a DELETE/INSERT operation, so this check is
> * still required when the operation is CMD_UPDATE.
> */

Done. Instead of DELETE/INSERT, used DELETE+INSERT.

>
> 10. The following code would be more clear if you replaced
> mtstate->mt_transition_capture with transition_capture.
>
> if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
> && mtstate->mt_transition_capture->tcs_update_new_table)
> {
> ExecARUpdateTriggers(estate, resultRelInfo, NULL,
> NULL,
> tuple,
> NULL,
> mtstate->mt_transition_capture);
>
> /*
> * Now that we have already captured NEW TABLE row, any AR INSERT
> * trigger should not again capture it below. Arrange for the same.
> */
> transition_capture = NULL;
> }
>
> You are, after all, doing:
>
> transition_capture = mtstate->mt_transition_capture;
>
> at the top of the function. There are a few other places you're also
> accessing mtstate->mt_transition_capture.

Actually I wanted to be able to have a temporary variable that has
it's scope only for ExecARInsertTriggers(). But because that wasn't
possible, had to declare it at the top. I feel if we use
transition_capture all over, and if some future code below the NULL
assignment starts using transition_capture, it will wrongly get the
left-over NULL value.

Instead, what I have done is : used a special variable name only for
this purpose : ar_insert_trig_tcs, so that code won't use this
variable, by looking at it's name. And also moved it's assignment down
to where it is used the first time.

Similarly for ExecDelete(), used ar_delete_trig_tcs.

>
> 11. Should tuple_deleted and process_returning be camelCase like the
> other params?:
>
> static TupleTableSlot *
> ExecDelete(ModifyTableState *mtstate,
> ItemPointer tupleid,
> HeapTuple oldtuple,
> TupleTableSlot *planSlot,
> EPQState *epqstate,
> EState *estate,
> bool *tuple_deleted,
> bool process_returning,
> bool canSetTag)

Done.

>
> 12. The following comment talks about "target table descriptor", which
> I think is a good term. In several other places, you mention "root",
> which I take it to mean "target table".
>
> * This map array is required for two purposes :
> * 1. For update-tuple-routing. We need to convert the tuple from the subplan
> * result rel to the root partitioned table descriptor.
> * 2. For capturing transition tuples when the target table is a partitioned
> * table. For updates, we need to convert the tuple from subplan result rel to
> * target table descriptor, and for inserts, we need to convert the inserted
> * tuple from leaf partition to the target table descriptor.
>
> I'd personally rather we always talked about "target" rather than
> "root". I understand there's probably many places in the code
> where we talk about the target table as "root", but I really think we
> need to fix that, so I'd rather not see the problem get any worse
> before it gets better.

Not very sure if that's true at all places. In some contexts, it makes
sense to use root to emphasize that it is the root partitioned table.
E.g. :

+ * For ExecInsert(), make it look like we are inserting into the
+ * root.
+ */
+ Assert(mtstate->rootResultRelInfo != NULL);
+ estate->es_result_relation_info = mtstate->rootResultRelInfo;

+ * resultRelInfo is one of the per-subplan resultRelInfos. So we
+ * should convert the tuple into root's tuple descriptor, since
+ * ExecInsert() starts the search from root. The tuple conversion

>
> The comment block might also look better if you tab indent after the
> 1. and 2. then on each line below it.

Used spaces instead of tab, because tab was taking it too much away
from the numbers, which looked odd.

> Also the space before the ':' is not correct.
Done

>
> 13. Does the following code really need to palloc0 rather than just palloc?
>
> /*
> * Build array of conversion maps from each child's TupleDesc to the
> * one used in the tuplestore. The map pointers may be NULL when no
> * conversion is necessary, which is hopefully a common case for
> * partitions.
> */
> mtstate->mt_childparent_tupconv_maps = (TupleConversionMap **)
> palloc0(sizeof(TupleConversionMap *) * numResultRelInfos);
>
> I don't see any case in the initialization of the array where any of
> the elements are not assigned a value, so I think palloc() is fine.

Right. Used palloc().

>
> 14. I don't really like the way tupconv_map_for_subplan() works. It
> would be nice to have two separate functions for this, but looking a
> bit more at it, it seems the caller won't just need to always call
> exactly one of those functions. I don't have any ideas to improve it,
> so this is just a note.

I am assuming you mean one function for the case where
mt_is_tupconv_perpart is true, and the other function when it is not
true. The idea is, the caller should not have to worry if the map is
per-subplan or not.

>
> 15. I still don't really like the way ExecInitModifyTable() sets and
> unsets update_tuple_routing_needed. I know we talked about this
> before, but couldn't you just change:
>
> if (resultRelInfo->ri_TrigDesc &&
> resultRelInfo->ri_TrigDesc->trig_update_before_row &&
> operation == CMD_UPDATE)
> update_tuple_routing_needed = true;
>
> To:
>
> if (resultRelInfo->ri_TrigDesc &&
> resultRelInfo->ri_TrigDesc->trig_update_before_row &&
> node->partitioned_rels != NIL &&
> operation == CMD_UPDATE)
> update_tuple_routing_needed = true;
>
> and get rid of:
> .....
> if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
> update_tuple_routing_needed = false;
>
> looking at inheritance_planner(), partitioned_rels is only set to a
> non-NIL value if parent_rte->relkind == RELKIND_PARTITIONED_TABLE.
>

Initially update_tuple_routing_needed can be already true because of :
bool update_tuple_routing_needed = node->partKeyUpdated;

So if it's not a partitioned table and update_tuple_routing_needed is
set to true due to the above declaration, the variable will remain
true if we don't check the relkind in the end, which means the final
conclusion will be that update-tuple-routing is required, when it is
really not. Now, I understand that node->partKeyUpdated will not be
true if it's a partitioned table, but I think we better play safe
here. partKeyUpdated as per its name implies whether any of the
partition key columns are updated; it does not imply whether the
target table is a partitioned table or just a partition.

> 16. "named" -> "target" in:
>
> * 'partKeyUpdated' is true if any partitioning columns are being updated,
> * either from the named relation or a descendent partitioned table.
>
> I guess we're calling this one of; root, named, target :-(

Changed it to:
* either from the target relation or a descendent partitioned table.

>
> 17. You still have the following comment in ModifyTableState but
> you've moved all those fields out to PartitionTupleRouting:
>
> /* Tuple-routing support info */

This comment applies to mt_partition_tuple_routing field.

>
> 18. Should the following not be just called partKeyUpdate (without the 'd')?
>
> bool partKeyUpdated; /* some part key in hierarchy updated */
>
> This occurs in the planner were the part key is certainly being updated.
>

Actually the way it is named, it can mean : the partition key "is
updated" or "..has been updated" or "..is being updated" all of which
make sense. This sounds consistent with the name
RangeTblEntry->updatedCols that means "which of the columns are being
updated".

> 19. In pathnode.h you've named a parameter partColsUpdated, but the
> function in the .c file calls it partKeyUpdated.

Renamed partColsUpdated to partKeyUpdated.

>
> I'll try to look at the tests tomorrow and also do some testing. So
> far I've only read the code and the docs.

Thanks David. Your review is valuable.

> 20. "carried" -> "carried out the"
>
> + would have identified the newly updated row and carried
> + <command>UPDATE</command>/<command>DELETE</command> on this new row

Done.

>
> 21. Extra new line
>
> + <xref linkend="ddl-partitioning-declarative-limitations">.
> +
> </para>

Done.

I am not sure when exactly, but this line has started giving compile
errors, seemingly because > should be />. Fixed it.

>
> 22. In copy.c CopyFrom() you have the following code:
>
> /*
> * We might need to convert from the parent rowtype to the
> * partition rowtype.
> */
> map = proute->partition_tupconv_maps[leaf_part_index];
> if (map)
> {
> Relation partrel = resultRelInfo->ri_RelationDesc;
>
> tuple = do_convert_tuple(tuple, map);
>
> /*
> * We must use the partition's tuple descriptor from this
> * point on. Use a dedicated slot from this point on until
> * we're finished dealing with the partition.
> */
> slot = proute->partition_tuple_slot;
> Assert(slot != NULL);
> ExecSetSlotDescriptor(slot, RelationGetDescr(partrel));
> ExecStoreTuple(tuple, slot, InvalidBuffer, true);
> }
>
> Should this use ConvertPartitionTupleSlot() instead?

I will have a look at it to see if we can use
ConvertPartitionTupleSlot() without any changes.
(TODO)

>
> 23. Why write;
>
> last_resultRelInfo = mtstate->resultRelInfo + mtstate->mt_nplans;
>
> when you can write;
>
> last_resultRelInfo = mtstate->resultRelInfo[mtstate->mt_nplans];?

You meant : (with &)
> last_resultRelInfo = &mtstate->resultRelInfo[mtstate->mt_nplans];?

I think both are equally good, and equally readable. In this case, we
don't even want the array element, so why not just increment the
pointer to a particular offset.

>
>
> 24. In ExecCleanupTupleRouting(), do you think that you could just
> have a special case loop for (mtstate && mtstate->operation ==
> CMD_UPDATE)?
>
> /*
> * If this result rel is one of the UPDATE subplan result rels, let
> * ExecEndPlan() close it. For INSERT or COPY, this does not apply
> * because leaf partition result rels are always newly allocated.
> */
> if (is_update &&
> resultRelInfo >= first_resultRelInfo &&
> resultRelInfo < last_resultRelInfo)
> continue;
>
> Something like:
>
> if (mtstate && mtstate->operation == CMD_UPDATE)
> {
> ResultRelInfo *first_resultRelInfo = mtstate->resultRelInfo;
> ResultRelInfo *last_resultRelInfo =
> mtstate->resultRelInfo[mtstate->mt_nplans];
>
> for (i = 0; i < proute->num_partitions; i++)
> {
> ResultRelInfo *resultRelInfo = proute->partitions[i];
>
> /*
> * Leave any resultRelInfos that belong to the UPDATE's subplan
> * list. These will be closed during executor shutdown.
> */
> if (resultRelInfo >= first_resultRelInfo &&
> resultRelInfo < last_resultRelInfo)
> continue;
>
> ExecCloseIndices(resultRelInfo);
> heap_close(resultRelInfo->ri_RelationDesc, NoLock);
> }
> }
> else
> {
> for (i = 0; i < proute->num_partitions; i++)
> {
> ResultRelInfo *resultRelInfo = proute->partitions[i];
>
> ExecCloseIndices(resultRelInfo);
> heap_close(resultRelInfo->ri_RelationDesc, NoLock);
> }
> }

I thought it's not worth having two separate loops in order to reduce
one if(is_update) condition in case of inserts. Although we will have
one less is_update check per partition, the code is not running
per-row.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment	Content-Type	Size
0001-Refactor-CheckConstraint-related-code.patch	application/octet-stream	10.1 KB
update-partition-key_v34.patch	application/octet-stream	113.6 KB

In response to

Re: [HACKERS] UPDATE of partition key at 2018-01-04 07:54:29 from David Rowley

Responses

Re: [HACKERS] UPDATE of partition key at 2018-01-05 20:25:49 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2018-01-05 12:15:36	Re: Enhance pg_stat_wal_receiver view to display connected host
Previous Message	Stas Kelvich	2018-01-05 12:08:33	Re: [HACKERS] Issues with logical replication