Re: [HACKERS] MERGE SQL Statement for PG11

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-23 12:33:57
Message-ID: 5a7e5749-3601-5e27-037e-731370d67e98@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/03/23 20:07, Pavan Deolasee wrote:
> On Fri, Mar 23, 2018 at 12:57 PM, Amit Langote wrote:
>> Also, it seems that the delta patch I sent in the last email didn't
>> contain all the changes I had to make. It didn't contain, for example,
>> replacing adjust_and_expand_inherited_tlist() with
>> adjust_partition_tlist(). I guess you'll know when you rebase anyway.
>>
>
> Yes, I am planning to fix that once the ON CONFLICT patch is
> ready/committed.

OK, thanks.

>> 3. I think the comment above this should be updated to explain why the
>> map_index is being set to the leaf index in case of MERGE instead of the
>> subplan index as is done in case of plain UPDATE:
>>
>> - map_index = resultRelInfo - mtstate->resultRelInfo;
>> - Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
>> - tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
>> + if (mtstate->operation == CMD_MERGE)
>> + {
>> + map_index = resultRelInfo->ri_PartitionLeafIndex;
>> + Assert(mtstate->rootResultRelInfo == NULL);
>> + tupconv_map = TupConvMapForLeaf(proute,
>> + mtstate->resultRelInfo,
>> + map_index);
>> + }
>> + else
>> + {
>>
>
> Done. I wonder though if we should just always set ri_PartitionLeafIndex
> even for regular UPDATE and always use that to retrieve the map.

In the regular UPDATE case, we'd be looking at a resultRelInfo that's from
the ModifyTableState's per-subplan result rel array and the
TupleConversionMap array perhaps accepts subscripts that are in range 0 to
mtstate->nplans - 1 (not 0 to nparts - 1), so using ri_PartitionLeafIndex
there would be incorrect, I think.

>> 4. Do you think it would be possible at a later late to change this junk
>> attribute to contain something other than "tableoid"?
>>
>> + if (operation == CMD_MERGE)
>> + {
>> + j->jf_otherJunkAttNo =
>> ExecFindJunkAttribute(j, "tableoid");
>> + if
>> (!AttributeNumberIsValid(j->jf_otherJunkAttNo))
>> + elog(ERROR, "could not find junk tableoid
>> column");
>> +
>> + }
>>
>> Currently, it seems ExecMergeMatched will take this OID and look up the
>> partition (its ResultRelInfo) by calling ExecFindPartitionByOid which in
>> turn looks it up in the PartitionTupleRouting struct. I'm imagining it
>> might be possible to instead return an integer that specifies "a partition
>> number". Of course, nothing like that exists currently, but just curious
>> if we're going to be "stuck" with this junk attribute always containing
>> "tableoid". Or maybe putting a "partition number" into the junk attribute
>> is not doable to begin with.
>>
>
> I am not sure. Wouldn't adding a new junk column require a whole new
> machinery? It might be worth adding it someday to reduce the cost
> associated with the lookups. But I don't want to include the change in this
> already largish patch.

No I wasn't suggesting that we do that in this patch. Also, I wasn't
saying that we store the "partition index" in a *new* junk column, but
*instead of* tableoid as this patch does. My question was whether we can
replace tableoid that we are going to store with this patch in the MERGE's
source plan's targetlist with something else in the future.

>> 5. In ExecInitModifyTable, in the if (node->mergeActionList) block:
>>
>> +
>> + /* initialize slot for the existing tuple */
>> + mtstate->mt_existing = ExecInitExtraTupleSlot(estate, NULL);
>>
>> Maybe a good idea to Assert that mt_existing is NULL before. Also, why
>> not set the slot's descriptor right away if tuple routing is not going to
>> be used. I did it that way in the ON CONFLICT DO UPDATE patch.
>>
>
> Yeah, I plan to update this code once the other patch gets in. This change
> was mostly borrowed from your/Alvaro's patch, but I don't think it will be
> part of the MERGE patch if the ON CONFLICT DO UPDATE patch gets in ahead of
> this.

OK.

>> 6. I see that there is a slot called mergeSlot that becomes part of
>> ResultRelInfo of the table (partition) via ri_MergeState. That means we
>> might end up creating as many slots as there are partitions (* number of
>> actions?). Can't we have just one, say, mt_mergeproj in ModifyTableState
>> similar to mt_conflproj and just reset its descriptor before use. I guess
>> reset will have happen before carrying out an action applied to a given
>> partition. When I tried that (see attached delta), nothing got broken.
>>
>>
> Thanks! It was on my TODO list. So thanks for taking care of it. I've
> included your patch in the main patch. I imagine we can similarly set the
> tuple descriptor for this slot during initialisation if target table is a
> non-partitioned table. But I shall take care of that along with
> mt_existing. In fact, I wonder if we should combine mt_confproj and
> mt_mergeproj and just have one slot. They are mutually exclusive in their
> use, but have lot in common.

OK.

> As someone who understands partitioning best, do you have any other
> comments/concerns regarding partitioning related code in the patch? I would
> appreciate if you can give it a look and also run any tests that you may
> have handy.

Actually, I don't yet understand the full scope of what MERGE is supposed
to do. I guess if it gives same answers for a partitioned table as it
does for regular tables for different MERGE commands, that's enough to say
that MERGE supports partitioning. But maybe there are corner cases where
MERGE doesn't work same with partitioning because of some underlying
implementation details (mostly executor, afaics). I will try to test more
early next week.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Finnerty 2018-03-23 12:42:47 Re: Removing useless DISTINCT clauses
Previous Message Etsuro Fujita 2018-03-23 12:22:40 Re: [HACKERS] Add support for tuple routing to foreign partitions