partition routing layering in nodeModifyTable.c

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: partition routing layering in nodeModifyTable.c
Date: 2019-07-18 01:09:11
Message-ID: 20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While discussing partition related code with David in [1], I again was
confused by the layering of partition related code in
nodeModifyTable.c.

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation, but it's made worse by
ExecUpdate() having its partition handling solely inside - including another
call to ExecInsert(), including the surrounding partition setup code.

And even worse, after all that, ExecInsert() still contains partitioning
code.

It seems to me that if we just moved the ExecPrepareTupleRouting() into
ExecInsert(), we could remove the duplication.

2) The contents of the
/*
* If a partition check failed, try to move the row into the right
* partition.
*/
if (partition_constraint_failed)

block ought to be moved to a separate function (maybe
ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already
complicated enough without dealing with the partition move.

3) How come we reset estate->es_result_relation_info after partition
routing, but not the mtstate wide changes by
ExecPrepareTupleRouting()? Note that its comment says:

* Caller must revert the estate changes after executing the insertion!
* In mtstate, transition capture changes may also need to be reverted.

ExecUpdate() contains

/*
* Updates set the transition capture map only when a new subplan
* is chosen. But for inserts, it is set for each row. So after
* INSERT, we need to revert back to the map created for UPDATE;
* otherwise the next UPDATE will incorrectly use the one created
* for INSERT. So first save the one created for UPDATE.
*/
if (mtstate->mt_transition_capture)
saved_tcs_map = mtstate->mt_transition_capture->tcs_map;

but as I read the code, that's not really true? It's
ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate().

4)
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ar_insert_trig_tcs = mtstate->mt_transition_capture;
if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
&& mtstate->mt_transition_capture->tcs_update_new_table)
{
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* We've already captured the NEW TABLE row, so make sure any AR
* INSERT trigger fired below doesn't capture it again.
*/
ar_insert_trig_tcs = NULL;
}

/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);

Besides not using the just defined ar_insert_trig_tcs and instead
repeatedly referring to mtstate->mt_transition_capture, wouldn't this be
a easier to understand if the were an if/else, instead of resetting
ar_insert_trig_tcs? If the block were

/*
* triggers behave differently depending on this being a delete as
* part of a partion move, or a deletion proper.
if (mtstate->operation == CMD_UPDATE)
{
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* But we do want to fire plain per-row INSERT triggers on the
* new table. By not passing in transition_capture we prevent
* ....
*/
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
NULL);
}
else
{
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);
}

it seems like it'd be quite a bit clearer (although I do think the
comments also need a fair bit of polishing independent of this proposed
change).

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/CAKJS1f-YObQJTbncGJGRZ6gSFiS%2Bgw_Y5kvrpR%3DvEnFKH17AVA%40mail.gmail.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-07-18 01:16:32 Re: refactoring - share str2*int64 functions
Previous Message Edmund Horner 2019-07-18 00:48:08 Re: Tid scan improvements