Re: UPDATE of partition key

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-21 09:28:20
Message-ID: c36905be-27be-bb4a-11cc-0be4b83aca4a@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/06/21 3:53, Robert Haas wrote:
> On Tue, Jun 20, 2017 at 2:54 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>>> I guess I don't see why it should work like this. In the INSERT case,
>>> we must build withCheckOption objects for each partition because those
>>> partitions don't appear in the plan otherwise -- but in the UPDATE
>>> case, they're already there, so why do we need to build anything at
>>> all? Similarly for RETURNING projections. How are the things we need
>>> for those cases not already getting built, associated with the
>>> relevant resultRelInfos? Maybe there's a concern if some children got
>>> pruned - they could turn out later to be the children into which
>>> tuples need to be routed. But the patch makes no distinction
>>> between possibly-pruned children and any others.
>>
>> Yes, only a subset of the partitions appear in the UPDATE subplans. I
>> think typically for updates, a very small subset of the total leaf
>> partitions will be there in the plans, others would get pruned. IMHO,
>> it would not be worth having an optimization where it opens only those
>> leaf partitions which are not already there in the subplans. Without
>> the optimization, we are able to re-use the INSERT infrastructure
>> without additional changes.
>
> Well, that is possible, but certainly not guaranteed. I mean,
> somebody could do a whole-table UPDATE, or an UPDATE that hits a
> smattering of rows in every partition; e.g. the table is partitioned
> on order number, and you do UPDATE lineitem SET product_code = 'K372B'
> WHERE product_code = 'K372'.
>
> Leaving that aside, the point here is that you're rebuilding
> withCheckOptions and returningLists that have already been built in
> the planner. That's bad for two reasons. First, it's inefficient,
> especially if there are many partitions. Second, it will amount to a
> functional bug if you get a different answer than the planner did.
> Note this comment in the existing code:
>
> /*
> * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
> * that we didn't build the withCheckOptionList for each partition within
> * the planner, but simple translation of the varattnos for each partition
> * will suffice. This only occurs for the INSERT case; UPDATE/DELETE
> * cases are handled above.
> */
>
> The comment "UPDATE/DELETE cases are handled above" is referring to
> the code that initializes the WCOs generated by the planner. You've
> modified the comment in your patch, but the associated code: your
> updated comment says that only "DELETEs and local UPDATES are handled
> above", but in reality, *all* updates are still handled above. And
> then they are handled again here. Similarly for returning lists.
> It's certainly not OK for the comment to be inaccurate, but I think
> it's also bad to redo the work which the planner has already done,
> even if it makes the patch smaller.

I guess this has to do with the UPDATE turning into DELETE+INSERT. So, it
seems like WCOs are being initialized for the leaf partitions
(ResultRelInfos in the mt_partitions array) that are in turn are
initialized for the aforementioned INSERT. That's why the term "...local
UPDATEs" in the new comment text.

If that's true, I wonder if it makes sense to apply what would be
WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
by calling ExecInsert()?

> Also, I feel like it's probably not correct to use the first result
> relation as the nominal relation for building WCOs and returning lists
> anyway. I mean, if the first result relation has a different column
> order than the parent relation, isn't this just broken? If it works
> for some reason, the comments don't explain what that reason is.

Yep, it's more appropriate to use
ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow. That
is, if answer to the question I raised above is positive.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message sanyam jain 2017-06-21 09:30:02 Re: Logical decoding on standby
Previous Message Amit Langote 2017-06-21 08:57:55 Re: Useless code in ExecInitModifyTable