Quick Links

Re: UPDATE of partition key

From:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: UPDATE of partition key
Date:	2017-08-04 16:58:00
Message-ID:	CAJ3gD9eu6-4pLpxxaMTn21eTF0zPtMxba6he4FyV81nPPMgRjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
> Below are the TODOS at this point :
>
> Fix for bug reported by Rajkumar about update with join.

I had explained the root issue of this bug here : [1]

Attached patch includes the fix, which is explained below.
Currently in the patch, there is a check if the tuple is concurrently
deleted by other session, i.e. when heap_update() returns
HeapTupleUpdated. In such case we set concurrently_deleted output
param to true. We should also do the same for HeapTupleSelfUpdated
return value.

In fact, there are other places in ExecDelete() where it can return
without doing anything. For e.g. if a BR DELETE trigger prevents the
delete from happening, ExecBRDeleteTriggers() returns false, in which
case ExecDelete() returns.

So what the fix does is : rename concurrently_deleted parameter to
delete_skipped so as to indicate a more general status : whether
delete has actually happened or was it skipped. And set this param to
true only after the delete happens. This allows us to avoid adding a
new rows for the trigger case also.

Added test scenario for UPDATE with JOIN case, and also TRIGGER case.

> Do something about two separate mapping tables for Transition tables
> and update tuple-routing.
On 1 July 2017 at 03:15, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Would make sense to have a set of functions with names like
> GetConvertor{From,To}{Subplan,Leaf}(mtstate, index) which build arrays
> m_convertors_{from,to}_by_{subplan,leaf} the first time they need
> them?

This was discussed here : [2]. I think even if we have them built when
needed, still in presence of both tuple routing and transition tables,
we do need separate arrays. So I think rather than dynamic arrays, we
can have static arrays but their elements will point to a shared
TupleConversionMap structure whenever possible.
As already in the patch, in case of insert/update tuple routing, there
is a per-leaf partition mt_transition_tupconv_maps array for
transition tables, and a separate per-subplan arry mt_resultrel_maps
for update tuple routing. *But*, what I am proposing is: for the
mt_transition_tupconv_maps[] element for which the leaf partition also
exists as a per-subplan result, that array element and the
mt_resultrel_maps[] element will point to the same TupleConversionMap
structure.

This is quite similar to how we are re-using the per-subplan
resultrels for the per-leaf result rels. We will re-use the
per-subplan TupleConversionMap for the per-leaf
mt_transition_tupconv_maps[] elements.

Not yet implemented this.

> GetUpdatedColumns() to be moved to header file.

Done. I have moved it in execnodes.h

> More test scenarios in regression tests.
> Need to check/test whether we are correctly applying insert policies
> (ant not update) while inserting a routed tuple.

Yet to do above two.

> Use getASTriggerResultRelInfo() for attrno mapping, rather than first
> resultrel, for generating child WCO/RETURNING expression.
>

Regarding generating child WithCheckOption and Returning expressions
using those of the root result relation, ModifyTablePath and
ModifyTable should have new fields rootReturningList (and
rootWithCheckOptions) which would be derived from
root->parse->returningList in inheritance_planner(). But then, similar
to per-subplan returningList, rootReturningList would have to pass
through set_plan_refs()=>set_returning_clause_references() which
requires the subplan targetlist to be passed. Because of this, for
rootReturningList, we require a subplan for root partition, which is
not there currently; we have subpans only for child rels. That means
we would have to create such plan only for the sake of generating
rootReturningList.

The other option is to do the way the patch is currently doing in the
executor by using the returningList of the first per-subplan result
rel to generate the other child returningList (and WithCheckOption).
This is working by applying map_partition_varattnos() to the first
returningList. But now that we realized that we have to specially
handle whole-row vars, map_partition_varattnos() would need some
changes to convert whole row vars differently for
child-rel-to-child-rel mapping. For childrel-to-childrel conversion,
the whole-row var is already wrapped by ConvertRowtypeExpr, but we
need to change its Var->vartype to the new child vartype.

I think the second option looks easier, but I am open to suggestions,
and I am myself still checking the first one.

> Address Robert's review comments on make_resultrel_ordered.patch.
>
> +typedef struct ParentChild
>
> This is a pretty generic name. Pick something more specific and informative.

I have used ChildPartitionInfo. But suggestions welcome.

>
> +static List *append_rel_partition_oids(List *rel_list, Relation rel);
>
> One could be forgiven for thinking that this function was just going
> to append OIDs, but it actually appends ParentChild structures, so I
> think the name needs work.

Renamed it to append_child_partitions().

>
> +List *append_rel_partition_oids(List *rel_list, Relation rel)
>
> Style. Please pgindent your patches.

I have pgindent'ed changes in nodeModifyTable.c and partition.c, yet
to do that for others.

>
> +#ifdef DEBUG_PRINT_OIDS
> + print_oids(*leaf_part_oids);
> +#endif
>
> I'd just rip out this debug stuff once you've got this working, but if
> we keep it, it certainly can't have a name as generic as print_oids()
> when it's actually doing something with a list of ParentChild
> structures. Also, it prints names, not OIDs. And DEBUG_PRINT_OIDS is
> no good for the same reasons.

Now that I have tested it , I have removed this. Also, the ordered
subplans printed in explain output serve the same purpose.

>
> + if (RelationGetPartitionDesc(rel))
> + walker->rels_list = append_rel_partition_oids(walker->rels_list, rel);
>
> Every place that calls append_rel_partition_oids guards that call with
> if (RelationGetPartitionDesc(...)). It seems to me that it would be
> simpler to remove those tests and instead just replace the
> Assert(partdesc) inside that function with if (!partdesc) return;

Done.

>
> Is there any real benefit in this "walker" interface? It looks to me
> like it might be simpler to just change things around so that it
> returns a list of OIDs, like find_all_inheritors, but generated
> differently. Then if you want bound-ordering rather than
> OID-ordering, you just do this:
>
> list_free(inhOids);
> inhOids = get_partition_oids_in_bound_order(rel);
>
> That'd remove the need for some if/then logic as you've currently got
> in get_next_child().

Have explained this here :
https://www.postgresql.org/message-id/CAJ3gD9dQ2FKes8pP6aM-4Tx3ngqWvD8oyOJiDRxLVoQiY76t0A%40mail.gmail.com
I am aware that this might get changed once we checkin a separate
patch just floated to expand inheritence in bound order.

>
> + is_partitioned_resultrel =
> + (oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE
> + && rti == parse->resultRelation);
>
> I suspect this isn't correct for a table that contains wCTEs, because
> there would in that case be multiple result relations.
>
> I think we should always expand in bound order rather than only when
> it's a result relation.
Have changed it to always expand in bound order for partitioned table.

[1]. https://www.postgresql.org/message-id/CAKcux6%3Dz38gH4K6YAFi%2BYvo5tHTwBL4tam4VM33CAPZ5dDMk1Q%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CAEepm=3sc_j1zwqDYrbU4DTfX5rHcaMNNuaXRKWZFgt9m23OcA@mail.gmail.com

Attachment	Content-Type	Size
update-partition-key_v14.patch	application/octet-stream	91.7 KB

In response to

Re: UPDATE of partition key at 2017-07-28 17:45:46 from Amit Khandekar

Responses

Re: UPDATE of partition key at 2017-08-07 06:24:40 from Rajkumar Raghuwanshi
Re: UPDATE of partition key at 2017-08-11 05:14:58 from Amit Khandekar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-08-04 17:25:55	Re: expanding inheritance in partition bound order
Previous Message	Michael Paquier	2017-08-04 16:06:38	Re: pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL