Re: a misbehavior of partition row movement (?)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: a misbehavior of partition row movement (?)
Date: 2022-01-19 07:59:00
Message-ID: CA+HiwqFOQN17X+PsuQrA_BCe5OJF=Ep-um8MdzaSoqDVh_QNqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2022 at 4:13 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Wed, Jan 19, 2022 at 7:29 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > On 2022-Jan-18, Alvaro Herrera wrote:
> > > On 2022-Jan-18, Amit Langote wrote:
> > >
> > > > Would you like me to update the patch with the above renumbering or
> > > > are you working on a new version yourself?
> > >
> > > I have a few very minor changes apart from that. Let me see if I can
> > > get this pushed soon; if I'm unable to (there are parts I don't fully
> > > grok yet), I'll post what I have.
> >
> > Here's v13, a series with your patch as 0001 and a few changes from me;
> > the bulk is just pgindent, and there are a few stylistic changes and an
> > unrelated typo fix and I added a couple of comments to your new code.
>
> Thank you.
>
> > I don't like the API change to ExecInsert(). You're adding two output
> > arguments:
> > - the tuple being inserted. But surely you have this already, because
> > it's in the 'slot' argument you passed in. ExecInsert is even careful to
> > set the ->tts_tableOid argument there. So ExecInsert's caller doesn't
> > need to receive the inserted tuple as an argument, it can just read
> > 'slot'.
>
> Hmm, ExecInsert()'s input slot belongs to either the source partition
> or the "root" target relation, the latter due to the following stanza
> in ExecCrossPartitionUpdate():
>
> /*
> * resultRelInfo is one of the per-relation resultRelInfos. So we should
> * convert the tuple into root's tuple descriptor if needed, since
> * ExecInsert() starts the search from root.
> */
> tupconv_map = ExecGetChildToRootMap(resultRelInfo);
> if (tupconv_map != NULL)
> slot = execute_attr_map_slot(tupconv_map->attrMap,
> slot,
> mtstate->mt_root_tuple_slot);
>
> Though the slot whose tuple ExecInsert() ultimately inserts may be
> destination partition's ri_PartitionTupleSlot due to the following
> stanza in it:
>
> /*
> * If the input result relation is a partitioned table, find the leaf
> * partition to insert the tuple into.
> */
> if (proute)
> {
> ResultRelInfo *partRelInfo;
>
> slot = ExecPrepareTupleRouting(mtstate, estate, proute,
> resultRelInfo, slot,
> &partRelInfo);
> resultRelInfo = partRelInfo;
> }
>
> It's not great that ExecInsert()'s existing header comment doesn't
> mention that the slot whose tuple is actually inserted may not be the
> slot it receives from the caller :-(.
>
> > - the relation to which the tuple was inserted. Can't this be obtained
> > by looking at slot->tts_tableOid? We should be able to use
> > ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
> > that this is wasteful, but I think this is not a common case anyway and
> > it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
> > of its other calls).
>
> ExecLookupResultRelByOid() is only useful when *all* relevant leaf
> partitions are present in the ModifyTableState.resultRelInfo array
> (due to being present in ModifyTable.resultRelations). Leaf
> partitions that are only initialized by tuple routing (such as
> destination partitions of cross-partition updates) are only present in
> ModifyTableState.mt_partition_tuple_routing.partitions[] that are not
> discoverable by ExecLookupResultRelByOid().
>
> > I think the argument definition of ExecCrossPartitionUpdateForeignKey is
> > a bit messy. I propose to move mtstate,estate as two first arguments;
> > IIRC the whole executor does it that way.
>
> Okay, done.
>
> > AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
> > mtstate->operation -- why doesn't it look at 'event' instead?) and later
> > it determines new_event.ate_flags. Why can't it use
> > maybe_crosspart_update to simplify part of that? Or maybe the other way
> > around, not sure. It looks like something could be made simpler there.
>
> Actually, I remember disliking maybe_crosspart_update for sounding a
> bit fuzzy and also having to add mtstate to a bunch of trigger.c
> interface functions only to calculate that.
>
> I now wonder if passing an is_crosspart_update through
> ExecAR{Update|Delete}Triggers() would not be better. Both
> ExecDelete() and ExecCrossPartitionUpdateForeignKey() know for sure if
> their ExecAR{Update|Delete}Triggers() invocation is for a
> cross-partition update, so better to just pass that information down
> to AfterTriggerSaveEvent() than pass 'mtstate' and have the latter
> reverse-engineer only a fuzzy guess of whether that's the case.
>
> I like that interface better and have implemented it in the updated version.
>
> I've also merged your changes and made some of my own as mentioned
> above to end up with the attached v14. I'm also attaching a delta
> patch over v13 (0001+0002) to easily see the changes I made to get
> v14.

Oops, broke the cfbot's patch-applier again. Delta-diff reattached as txt.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
delta-over-v13.diff.txt text/plain 16.1 KB
v14-0001-Enforce-foreign-key-correctly-during-cross-parti.patch application/octet-stream 62.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-01-19 08:01:17 Re: Schema variables - new implementation for Postgres 15
Previous Message Masahiko Sawada 2022-01-19 07:15:56 Re: Skipping logical replication transactions on subscriber side