Re: a misbehavior of partition row movement (?)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: a misbehavior of partition row movement (?)
Date: 2022-01-19 07:13:13
Message-ID: CA+HiwqFuAi7zCEC_7dhOLoEQ_HTq4DOyp9-NKzfZVNRdFHSs3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2022 at 7:29 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2022-Jan-18, Alvaro Herrera wrote:
> > On 2022-Jan-18, Amit Langote wrote:
> >
> > > Would you like me to update the patch with the above renumbering or
> > > are you working on a new version yourself?
> >
> > I have a few very minor changes apart from that. Let me see if I can
> > get this pushed soon; if I'm unable to (there are parts I don't fully
> > grok yet), I'll post what I have.
>
> Here's v13, a series with your patch as 0001 and a few changes from me;
> the bulk is just pgindent, and there are a few stylistic changes and an
> unrelated typo fix and I added a couple of comments to your new code.

Thank you.

> I don't like the API change to ExecInsert(). You're adding two output
> arguments:
> - the tuple being inserted. But surely you have this already, because
> it's in the 'slot' argument you passed in. ExecInsert is even careful to
> set the ->tts_tableOid argument there. So ExecInsert's caller doesn't
> need to receive the inserted tuple as an argument, it can just read
> 'slot'.

Hmm, ExecInsert()'s input slot belongs to either the source partition
or the "root" target relation, the latter due to the following stanza
in ExecCrossPartitionUpdate():

/*
* resultRelInfo is one of the per-relation resultRelInfos. So we should
* convert the tuple into root's tuple descriptor if needed, since
* ExecInsert() starts the search from root.
*/
tupconv_map = ExecGetChildToRootMap(resultRelInfo);
if (tupconv_map != NULL)
slot = execute_attr_map_slot(tupconv_map->attrMap,
slot,
mtstate->mt_root_tuple_slot);

Though the slot whose tuple ExecInsert() ultimately inserts may be
destination partition's ri_PartitionTupleSlot due to the following
stanza in it:

/*
* If the input result relation is a partitioned table, find the leaf
* partition to insert the tuple into.
*/
if (proute)
{
ResultRelInfo *partRelInfo;

slot = ExecPrepareTupleRouting(mtstate, estate, proute,
resultRelInfo, slot,
&partRelInfo);
resultRelInfo = partRelInfo;
}

It's not great that ExecInsert()'s existing header comment doesn't
mention that the slot whose tuple is actually inserted may not be the
slot it receives from the caller :-(.

> - the relation to which the tuple was inserted. Can't this be obtained
> by looking at slot->tts_tableOid? We should be able to use
> ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
> that this is wasteful, but I think this is not a common case anyway and
> it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
> of its other calls).

ExecLookupResultRelByOid() is only useful when *all* relevant leaf
partitions are present in the ModifyTableState.resultRelInfo array
(due to being present in ModifyTable.resultRelations). Leaf
partitions that are only initialized by tuple routing (such as
destination partitions of cross-partition updates) are only present in
ModifyTableState.mt_partition_tuple_routing.partitions[] that are not
discoverable by ExecLookupResultRelByOid().

> I think the argument definition of ExecCrossPartitionUpdateForeignKey is
> a bit messy. I propose to move mtstate,estate as two first arguments;
> IIRC the whole executor does it that way.

Okay, done.

> AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
> mtstate->operation -- why doesn't it look at 'event' instead?) and later
> it determines new_event.ate_flags. Why can't it use
> maybe_crosspart_update to simplify part of that? Or maybe the other way
> around, not sure. It looks like something could be made simpler there.

Actually, I remember disliking maybe_crosspart_update for sounding a
bit fuzzy and also having to add mtstate to a bunch of trigger.c
interface functions only to calculate that.

I now wonder if passing an is_crosspart_update through
ExecAR{Update|Delete}Triggers() would not be better. Both
ExecDelete() and ExecCrossPartitionUpdateForeignKey() know for sure if
their ExecAR{Update|Delete}Triggers() invocation is for a
cross-partition update, so better to just pass that information down
to AfterTriggerSaveEvent() than pass 'mtstate' and have the latter
reverse-engineer only a fuzzy guess of whether that's the case.

I like that interface better and have implemented it in the updated version.

I've also merged your changes and made some of my own as mentioned
above to end up with the attached v14. I'm also attaching a delta
patch over v13 (0001+0002) to easily see the changes I made to get
v14.

BTW, your tweaks patch added this:

+ * *inserted_tuple is the tuple that's effectively inserted;
+ * *inserted_destrel is the relation where it was inserted.
+ * These are only set on success. FIXME -- see what happens on
the "do nothing" cases.

If by "do nothing cases" you mean INSERT ON CONFLICT ... DO NOTHING,
then I don't think it matters, because the caller in that case would
be ExecModifyTable() which doesn't care about inserted_tuple and
inserted_destrel.

> Overall, the idea embodied in the patch looks sensible to me.

Thanks again for taking time to review this.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
delta-over-v13.diff application/octet-stream 16.1 KB
v14-0001-Enforce-foreign-key-correctly-during-cross-parti.patch application/octet-stream 62.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2022-01-19 07:14:28 Re: Skipping logical replication transactions on subscriber side
Previous Message vignesh C 2022-01-19 06:32:19 Re: Skipping logical replication transactions on subscriber side