Re: [HACKERS] UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] UPDATE of partition key
Date: 2017-11-21 11:54:35
Message-ID: CAJ3gD9f4Um99sOJmVSEPj783VWw5GbXkgU3OWcYZJv=ipjTkAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following contains replies to David's remaining comments , i.e.
from #27 onwards, followed by replies to Alvaro's review comments.

Attached is the revised patch v25.

=====================

On 13 November 2017 at 18:25, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> 27. Comment does not explain how we're skipping checking the partition
> constraint check in:
>
> * We have already checked partition constraints above, so skip
> * checking them here.
>
> Maybe something like:
>
> * We've already checked the partition constraint above, however, we
> * must still ensure the tuple passes all other constraints, so we'll
> * call ExecConstraints() and have it validate all remaining checks.

Done.

>
> 28. For table WITH OIDs, the OID should probably follow the new tuple
> for partition-key-UPDATEs.
>

I understand that as far as possible we want to simulate the UPDATE as
if it's a normal table update. But for system columns, I think we
should avoid that; and instead, let the system handle it the way it is
handling (i.e. the new row in a table should always have a new OID.)

> 29. ExecSetupChildParentMap gets called here for non-partitioned relations.
> Maybe that's not the best function name? The function only seems to do
> that when perleaf is True.

I didn't clearly understand this, particularly, what task you were
referring to when you said "the function only seems to do that" ? The
function does setup child-parent map even when perleaf=false. The
function name is chosen that way because the map is always a
child-to-root map, but the map array elements may be arranged in the
order of the per-partition array 'mtstate->mt_partitions[]', or in the
order of the per-subplan result rels 'mtstate->resultRelInfo[]'

>
> Is a leaf a partition of a partitioned table? It's not that clear the
> meaning here.

Leaf partition means it is a child of a partitioned table, but it
itself is not a partitioned table.

I have added more comments for the function ExecSetupChildParentMap()
(both, at the function header and inside). Please check and let me
know if you still have questions.

>
> 30. The following chunk of code is giving me a headache trying to
> verify which arrays are which size:
>
> ExecSetupPartitionTupleRouting(rel,
> mtstate->resultRelInfo,
> (operation == CMD_UPDATE ? nplans : 0),
> node->nominalRelation,
> estate,
> &partition_dispatch_info,
> &partitions,
> &partition_tupconv_maps,
> &subplan_leaf_map,
> &partition_tuple_slot,
> &num_parted, &num_partitions);
> mtstate->mt_partition_dispatch_info = partition_dispatch_info;
> mtstate->mt_num_dispatch = num_parted;
> mtstate->mt_partitions = partitions;
> mtstate->mt_num_partitions = num_partitions;
> mtstate->mt_parentchild_tupconv_maps = partition_tupconv_maps;
> mtstate->mt_subplan_partition_offsets = subplan_leaf_map;
> mtstate->mt_partition_tuple_slot = partition_tuple_slot;
> mtstate->mt_root_tuple_slot = MakeTupleTableSlot();
>
> I know this patch is not completely responsible for it, but you're not
> making things any better.
>
> Would it not be better to invent some PartitionTupleRouting struct and
> make that struct a member of ModifyTableState and CopyState, then just
> pass the pointer to that struct to ExecSetupPartitionTupleRouting()
> and have it fill in the required details? I think the complexity of
> this is already on the high end, I think you really need to do the
> refactor before this gets any worse.
>

Ok. I am currently working on doing this change. So not yet included
in the attached patch. Will send yet another revised patch for this
change. (TODO)

>
> 31. The following code seems incorrect:
>
> /*
> * If this is an UPDATE and a BEFORE UPDATE trigger is present, we may
> * need to do update tuple routing.
> */
> if (resultRelInfo->ri_TrigDesc &&
> resultRelInfo->ri_TrigDesc->trig_update_before_row &&
> operation == CMD_UPDATE)
> update_tuple_routing_needed = true;
>
> Shouldn't this be setting update_tuple_routing_needed to false if
> there are no before row update triggers? Otherwise, you're setting it
> to true regardless of if there are any partition key columns being
> UPDATEd. That would make the work you're doing in
> inheritance_planner() to set part_cols_updated a waste of time.

The point of setting it to true regardless of whether the partition
key is updated is : even if partition key is not explicitly modified
by the UPDATE, a before-row trigger can update it later. But we can
never know whether it is actually going to update. So if there are BR
UPDATE triggers on result rels of any of the subplans, we *always*
setup the tuple routing. This approach was concluded in the earlier
discussions about trigger handling.

>
> Also, this bit of code is a bit confused.
>
> /* Decide whether we need to perform update tuple routing. */
> if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
> update_tuple_routing_needed = false;
>
> /*
> * Build state for tuple routing if it's an INSERT or if it's an UPDATE of
> * partition key.
> */
> if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
> (operation == CMD_INSERT || update_tuple_routing_needed))
>
>
> The first if test would not be required if you fixed the code where
> you set update_tuple_routing_needed = true regardless if its a
> partitioned table or not.

The place where I set update_tuple_routing_needed to true
unconditionally, we don't have the relation open, so we don't know
whether it is a partitioned table. Hence, set it anyways, and then
revert it to false if it's not a partitioned table after all.

>
> So basically, you need to take the node->part_cols_updated from the
> planner, if that's true then perform your test for before row update
> triggers, set a bool to false if there are none, then proceed to setup
> the partition tuple routing for partition table inserts or if your
> bool is still true. Right?

I think if we look at "update_tuple_routing_needed" as meaning that
update tuple routing *may be* required, then the logic as-is makes
sense: Set the variable if we see that we may require to do update
routing. And the conditions for that are : either node->partKeyUpdated
is true, or there is a BR UPDATE trigger and the operation is UPDATE.
So set this variable for those conditions, and revert it back to false
later if it is found that it's not a partitioned table.

So I have retained the existing logic in the patch, but with some
additional comments to make this logic clear to the reader.

>
> 32. "WCO" abbreviation is not that common and might need to be expanded.
>
> * Below are required as reference objects for mapping partition
> * attno's in expressions such as WCO and RETURNING.
>
> Searching for other comments which mention "WCO" they're all around
> places that is easy to understand they mean "With Check Option", e.g.
> next to a variable with a more descriptive name. That's not the case
> here.

Ok. Changed WCO to WithCheckOptions.

>
> 33. "are anyway newly allocated", should "anyway" be "always"?
> Otherwise, it does not make sense.
>

OK. Changed this :
* because all leaf partition result rels are anyway newly allocated.
to this (also removed 'all') :
* because leaf partition result rels are always newly allocated.

>
> 34. Comment added which mentions a member that does not exist.
>
> * all_part_cols contains all attribute numbers from the parent that are
> * used as partitioning columns by the parent or some descendent which is
> * itself partitioned.
> *

Oops. Left-overs from earlier patch. Removed the comment.

=====================

Below are Alvaro's review comments :

On 14 November 2017 at 22:22, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> David Rowley wrote:
>
>> 5. Triggers. Do we need a new "TG_" tag to allow trigger functions to
>> do something special when the DELETE/INSERT is a partition move? I
>> have audit tables in mind here it may appear as though a user
>> performed a DELETE when they actually performed an UPDATE giving
>> visibility of this to the trigger function will allow the application
>> to work around this.
>
> +1 I think we do need a flag that can be inspected from the user
> trigger function.

What I feel is : it's too early to do such changes. I think we should
first get in the core patch, and then consider this request and any
further enhancements.

>
>> 9. Also, this should likely be reworded:
>>
>> * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT,
>> * this is 0.
>>
>> 'num_update_rri' number of elements in 'update_rri' array or zero for INSERT.
>
> Also:
>
> /pgsql/source/master/src/backend/executor/execMain.c: In function 'ExecSetupPartitionTupleRouting':
> /pgsql/source/master/src/backend/executor/execMain.c:3401:18: warning: 'leaf_part_arr' may be used uninitialized in this function [-Wmaybe-uninitialized]
> leaf_part_rri = leaf_part_arr + i;
> ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
>

Right. I have now made "leaf_part_arr = NULL" during declaration.
Actually leaf_part_arr is used only for inserts; but for compiler-sake
we should add this initialization.

> I think using num_update_rri==0 as a flag to indicate INSERT is strange.
> I suggest passing an additional boolean --

I think adding another param looks redundant. To make the condition
more readable, I have introduced a new local variable :
bool is_update = (num_update_rri > 0);

> or maybe just split the whole
> function in two, one for updates and another for inserts, say
> ExecSetupPartitionTupleRoutingForInsert() and
> ExecSetupPartitionTupleRoutingForUpdate(). They seem to
> share almost no code, and the current flow is hard to read; maybe just
> add a common subroutine for the lower bottom of the loop.

So there are two common code sections. One is the initial code which
initializes various arrays and output params. And the 2nd common code
is the 2nd half of the for loop block that includes calls to
heap_open(), InitResultRelInfo(), convert_tuples_by_name(),
CheckValidResultRel() and others. So it looks like there is a lot of
common code. We would need to have two functions, one to have the
initialization code, and the other to run the later half of the loop.
And, heap_open() and InitResultRelInfo() need to be called only if
partrel (which needs to be passed as function param) is NULL. Rather
than this, I think this condition is better placed in-line in
ExecSetupPartitionTupleRouting() for clarity. I am feeling it's not
worth doing the shuffling. We are extracting the code into two
functions only to avoid the "if num_update_rri" conditions.

That's why I feel having a "is_update" variable would solve the
purpose. The hard-to-understand code, I presume, is the update part
where it tries to re-use already-existing result resl, and this part
would anyways remain, although in a separate function.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
update-partition-key_v25.patch application/octet-stream 112.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2017-11-21 11:57:01 Re: [HACKERS] Parallel Append implementation
Previous Message Kyotaro HORIGUCHI 2017-11-21 11:53:04 Re: Failed to delete old ReorderBuffer spilled files