Re: ON CONFLICT DO UPDATE for partitioned tables

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: ON CONFLICT DO UPDATE for partitioned tables
Date: 2018-03-22 04:34:51
Message-ID: CABOikdOQ3iLmSWpgmYnOmMXo05cG8rRDWpajVmEE8jnXeQF-=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 20, 2018 at 10:09 AM, Amit Langote <
Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> On 2018/03/20 13:30, Amit Langote wrote:
> > I have incorporated your patch in the main patch after updating the
> > comments a bit. Also, now that 6666ee49f49 is in [1], the transition
> > table related tests I proposed yesterday pass nicely. Instead of posting
> > as a separate patch, I have merged it with the main patch. So now that
> > planner refactoring is unnecessary, attached is just one patch.
>
> Sorry, I forgot to remove a hunk in the patch affecting
> src/include/optimizer/prep.h. Fixed in the attached updated version.
>

Thanks for writing a new version. A few comments:

<listitem>
<para>
- Using the <literal>ON CONFLICT</literal> clause with partitioned
tables
- will cause an error if the conflict target is specified (see
- <xref linkend="sql-on-conflict" /> for more details on how the
clause
- works). Therefore, it is not possible to specify
- <literal>DO UPDATE</literal> as the alternative action, because
- specifying the conflict target is mandatory in that case. On the
other
- hand, specifying <literal>DO NOTHING</literal> as the alternative
action
- works fine provided the conflict target is not specified. In that
case,
- unique constraints (or exclusion constraints) of the individual leaf
- partitions are considered.
- </para>
- </listitem>

We should document it somewhere that partition key update is not supported
by
ON CONFLICT DO UPDATE

/*
* get_partition_parent
+ * Obtain direct parent or topmost ancestor of given relation
*
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning
pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance
hierarchy.
*
* Note: Because this function assumes that the relation whose OID is
passed
* as an argument will have precisely one parent, it should only be called
* when it is known that the relation is a partition.
*/

Given that most callers only look for immediate parent, I wonder if it makes
sense to have a new function, get_partition_root(), instead of changing
signature of the current function. That will reduce foot-print of this patch
quite a lot.

@@ -36,6 +38,7 @@ static char
*ExecBuildSlotPartitionKeyDescription(Relation rel,
Datum *values,
bool *isnull,
int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap
*map);

We should name this function in a more generic way, given that it's going
to be
used for other things too. What about adjust_partition_tlist?

+
+ /*
+ * If partition's tuple descriptor differs from the root parent,
+ * we need to adjust the onConflictSet target list to account for
+ * differences in attribute numbers.
+ */
+ if (map != NULL)
+ {
+ /*
+ * First convert Vars to contain partition's atttribute
+ * numbers.
+ */
+
+ /* Convert Vars referencing EXCLUDED pseudo-relation. */
+ onconflset = map_partition_varattnos(node->onConflictSet,
+ INNER_VAR,
+ partrel,
+ firstResultRel, NULL);

Are we not modifying node->onConflictSet in place? Or does
map_partition_varattnos() create a fresh copy before scribbling on the
input?
If it's former then I guess that's a problem. If it's latter then we ought
to
have comments explaining that.

+ tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+ PinTupleDesc(tupDesc);

Why do we need to pin the descriptor? If we do need, why don't we need
corresponding ReleaseTupDesc() call?

I am still confused if the partition_conflproj_tdescs really belongs to
PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW for
the
MERGE patch, I moved everything to a new struct and made it part of the
ResultRelInfo. If no re-mapping is necessary, we can just point to the
struct
in the root relation's ResultRelInfo. Otherwise create and populate a new
one
in the partition relation's ResultRelInfo.

+ leaf_part_rri->ri_onConflictSetWhere =
+ ExecInitQual(onconflwhere, &mtstate->ps);
+ }

So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
ResultRelInfo. Why not move mt_conflproj_tupdesc,
partition_conflproj_tdescs,
partition_arbiter_indexes etc to the ResultRelInfo as well?

+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *

As I said above, we should disassociate this function from ON CONFLICT DO
UPDATE and rather have it as a general purpose facility.

+ * The expressions have already been fixed, but we have to make sure that
the
+ * target resnos match the partition. In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *

Can we have some explanation regarding how the targetlist is reordered? I
know
the function does that by updating the resno in place, but some explanation
would help. Also, should we add an assertion-build check to confirm that the
resultant list is actually ordered?

@@ -66,7 +67,8 @@ static TupleTableSlot
*ExecPrepareTupleRouting(ModifyTableState *mtstate,
EState *estate,
PartitionTupleRouting *proute,
ResultRelInfo *targetRelInfo,
- TupleTableSlot *slot);
+ TupleTableSlot *slot,
+ int *partition_index);
static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
@@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
+ int partition_index,
bool canSetTag)
{
HeapTuple tuple;

If we move the list of arbiter indexes and the tuple desc to ResultRelInfo,
as
suggested above, I think we can avoid making any API changes to
ExecPrepareTupleRouting and ExecInsert.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-03-22 04:36:07 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Amit Langote 2018-03-22 04:07:05 Re: [HACKERS] path toward faster partition pruning