Re: [HACKERS] UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: [HACKERS] UPDATE of partition key
Date: 2018-01-02 11:26:31
Message-ID: CAJ3gD9eRNOmv-=1ea8O_4c4fg_jVgaPy=p=zXvkp_0cW7ohtCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 December 2017 at 03:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> - map = ptr->partition_tupconv_maps[leaf_part_index];
> + map = ptr->parentchild_tupconv_maps[leaf_part_index];
>
> I don't think there's any reason to rename this. In previous patch
> versions, you had multiple arrays of tuple conversion maps in this
> structure, but the refactoring eliminated that.

Done in an earlier version of the patch.

>
> Likewise, I'm not sure I get the point of mt_transition_tupconv_maps
> -> mt_childparent_tupconv_maps. That seems like it could similarly be
> left alone.

We need to change it's name because now this map is not only used for
transition capture, but also for update-tuple-routing. Does it look ok
for you if, for readability, we keep the childparent tag ? Or else, we
can just make it "mt_tupconv_maps", but "mt_childparent_tupconv_maps"
looks more informative.

>
> + /*
> + * If transition tables are the only reason we're here, return. As
> + * mentioned above, we can also be here during update tuple routing in
> + * presence of transition tables, in which case this function is called
> + * separately for oldtup and newtup, so either can be NULL, not both.
> + */
> if (trigdesc == NULL ||
> (event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) ||
> (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
> - (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row))
> + (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) ||
> + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL))))
>
> I guess this is correct, but it seems awfully fragile. Can't we have
> some more explicit signaling about whether we're only here for
> transition tables, rather than deducing it based on exactly one of
> oldtup and newtup being NULL?

I had given a thought on this earlier. I felt, even the pre-existing
conditions like "!trigdesc->trig_update_after_row" are all indirect
ways to determine that this function is called only to capture
transition tables, and thought that it may have been better to have
separate parameter transition_table_only.

But then decided that I can continue on similar lines and add another
such condition to indicate that we are only capturing update-routed
tuples.

Instead of adding another parameter to AfterTriggerSaveEvent(), I had
also considered another approach: Put the transition-tuples-capture
logic part of AfterTriggerSaveEvent() into a helper function
CaptureTransitionTables(). In ExecInsert() and ExecDelete(), instead
of calling ExecARUpdateTriggers(), call this function
CaptureTransitionTables(). I then dropped this idea and thought rather
to call ExecARUpdateTriggers() which neatly does the required checks
and other things like locking the old tuple via GetTupleForTrigger().
So if we go by CaptureTransitionTables(), we would need to do what
ExecARUpdateTriggers() does before calling CaptureTransitionTables().
This is doable. If you think this is worth doing so as to get rid of
the "(oldtup == NULL) ^ (newtup == NULL)" condition, we can do that.

>
> + /* Initialization specific to update */
> + if (mtstate && mtstate->operation == CMD_UPDATE)
> + {
> + ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
> +
> + is_update = true;
> + update_rri = mtstate->resultRelInfo;
> + num_update_rri = list_length(node->plans);
> + }
>
> I guess I don't see why we need a separate "if" block for this.
> Neither is_update nor update_rri nor num_update_rri are used until we
> get to the block that begins with "if (is_update)". Why not just
> change that block to test if (mtstate && mtstate->operation ==
> CMD_UPDATE)" and put the rest of these initializations inside that
> block?

Done.

>
> + int num_update_rri = 0,
> + update_rri_index = 0;
> ...
> + update_rri_index = 0;
>
> It's already 0.

Done. Retained the comment that mentions why we need to set it to 0,
and added a note in the end that we have already done this during
initialization.

>
> + leaf_part_rri = &update_rri[update_rri_index];
> ...
> + leaf_part_rri = leaf_part_arr + i;
>
> These are doing the same kind of thing, but using different styles. I
> prefer the former style, so I'd change the second one to
> &leaf_part_arr[i]. Alternatively, you could change the first one to
> update_rri + update_rri_indx. But it's strange to see the same
> variable initialized in two different ways just a few lines apart.
>

Done. Used the first style.

>
> +static HeapTuple
> +ConvertPartitionTupleSlot(ModifyTableState *mtstate,
> + TupleConversionMap *map,
> + HeapTuple tuple,
> + TupleTableSlot *new_slot,
> + TupleTableSlot **p_my_slot)
>
> This function doesn't use the mtstate argument at all.

Removed mtstate.

>
> + * (Similarly we need to add the deleted row in OLD TABLE). We need to do
>
> The period should be before, not after, the closing parenthesis.

Done.

>
> + * Now that we have already captured NEW TABLE row, any AR INSERT
> + * trigger should not again capture it below. Arrange for the same.
>
> A more American style would be something like "We've already captured
> the NEW TABLE row, so make sure any AR INSERT trigger fired below
> doesn't capture it again." (Similarly for the other case.)

Done.

>
> + /* The delete has actually happened, so inform that to the caller */
> + if (tuple_deleted)
> + *tuple_deleted = true;
>
> In the US, we inform the caller, not inform that to the caller. In
> other words, here the direct object of "inform" is the person or thing
> getting the information (in this case, "the caller"), not the
> information being conveyed (in this case, "that"). I realize your
> usage is probably typical for your country...

Changed it to "inform the caller about the same"

>
> + Assert(mtstate->mt_is_tupconv_perpart == true);
>
> We usually just Assert(thing_that_should_be_true), not
> Assert(thing_that_should_be_true == true).

Ok. Changed it to Assert(mtstate->mt_is_tupconv_perpart)

>
> + * In case this is part of update tuple routing, put this row into the
> + * transition OLD TABLE if we are capturing transition tables. We need to
> + * do this separately for DELETE and INSERT because they happen on
> + * different tables.
>
> Maybe "...OLD table, but only if we are..."
>
> Should it be capturing transition tables or capturing transition
> tuples? I'm not sure.

Changed it to "capturing transition tuples". In trigger.c, I see this
short form notation as well as a long-form notation like "capturing
tuples in transition tables". But not seen anywhere "capturing
transition tables", and it does seem odd.

>
> + * partition, in which case, we should check the RLS CHECK policy just
>
> In the US, the second comma in this sentence is incorrect and should be removed.

Done.

>
> + * When an UPDATE is run with a leaf partition, we would not have
> + * partition tuple routing setup. In that case, fail with
>
> run with -> run on
> would not -> will not
> setup -> set up

Done.

>
> + * deleted by another transaction), then we should skip INSERT as
> + * well, otherwise, there will be effectively one new row inserted.
>
> skip INSERT -> skip the insert
> well, otherwise -> well; otherwise
>
> I would also change "there will be effectively one new row inserted"
> to "an UPDATE could cause an increase in the total number of rows
> across all partitions, which is clearly wrong".

Done both.

>
> + /*
> + * UPDATEs set the transition capture map only when a new subplan
> + * is chosen. But for INSERTs, it is set for each row. So after
> + * INSERT, we need to revert back to the map created for UPDATE;
> + * otherwise the next UPDATE will incorrectly use the one created
> + * for INESRT. So first save the one created for UPDATE.
> + */
> + if (mtstate->mt_transition_capture)
> + saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
>
> UPDATEs -> Updates
Done. I believe you want to do this only if it's a plural ? In the
same para, also changed "INSERTs" to "inserts".
> INESRT -> INSERT
Done.

>
> + * 2. For capturing transition tables that are partitions. For UPDATEs, we need
>
> This isn't worded well. A transition table is never a partition;
> transition tables and partitions are two different kinds of things.

Yeah. Changed it to :
"For capturing transition tuples when the target table is a partitioned table."

Attached v32 patch.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
update-partition-key_v32.tar.gz application/x-gzip 32.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-01-02 11:38:55 Re: [HACKERS] why not parallel seq scan for slow functions
Previous Message Pavel Stehule 2018-01-02 11:05:53 Re: [HACKERS] SQL/JSON in PostgreSQL