Re: Oddity in tuple routing for foreign partitions

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Oddity in tuple routing for foreign partitions
Date: 2018-04-26 08:36:44
Message-ID: 5AE18F9C.6080805@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2018/04/26 15:35), Amit Langote wrote:
> On 2018/04/26 12:43, Etsuro Fujita wrote:
>> (2018/04/25 17:29), Amit Langote wrote:
>>> Hmm, I missed that we do need information from a proper RTE as well. So,
>>> I suppose we should be doing this instead of creating the RTE for foreign
>>> partition from scratch.
>>>
>>> + rte = list_nth(estate->es_range_table, resultRelation - 1);
>>> + rte = copyObject(rte);
>>> + rte->relid = RelationGetRelid(rel);
>>> + rte->relkind = RELKIND_FOREIGN_TABLE;
>>
>> As I said upthread, we can use the RTE in the range table as-is if the
>> foreign table is one of the UPDATE subplan partitions or the target
>> specified in a COPY command. So, I created the patch that way because
>> that would save cycles. Why not just use that RTE in those cases?
>
> Yes, I agree it's better to reuse the RTE in the cases we can. So, how
> does this look:
>
> + /*
> + * If the foreign table is a partition, we need to create a new RTE
> + * describing the foreign table for use by deparseInsertSql and
> + * create_foreign_modify() below, after first copying the parent's
> + * RTE and modifying some fields to describe the foreign partition to
> + * work on. However, if this is invoked by UPDATE, the existing RTE
> + * may already correspond to this partition if it is one of the
> + * UPDATE subplan target rels; in that case, we can just use the
> + * existing RTE as-is.
> + */
> + rte = list_nth(estate->es_range_table, resultRelation - 1);
> + if (rte->relid != RelationGetRelid(rel))
> + {
> + rte = copyObject(rte);
> + rte->relid = RelationGetRelid(rel);
> + rte->relkind = RELKIND_FOREIGN_TABLE;
> +
> + /*
> + * For UPDATE, we must use the RT index of the first subplan
> + * target rel's RTE, because the core code would have built
> + * expressions for the partition, such as RETURNING, using that
> + * RT index as varno of Vars contained in those expressions.
> + */
> + if (plan&& plan->operation == CMD_UPDATE&&
> + resultRelation == plan->nominalRelation)
> + resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
> + }

Seems like a better change than mine; because this simplifies the logic.

> I added the block inside the if because I agree with your comment below
> that the changes to ExecInitPartitionInfo are better kept for later to
> think more carefully about the dependencies it might break.

OK

>>> If we apply the other patch I proposed, resultRelation always points to
>>> the correct RTE.
>>>
>>>>> I tried to do that. So, attached are two patches.
>>>>>
>>>>> 1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
>>>>> InitResultRelInfo
>>>>>
>>>>> 2. v5 of the patch to fix the bug of foreign partitions
>>>>>
>>>>> Thoughts?
>>
>> Actually, I also thought the former when creating the patch, but I left
>> that as-is because I'm not sure that would be really safe; ExecConstraints
>> looks at the RT index via GetInsertedColumns/GetUpdatedColumns, so what I
>> thought was: that might cause unexpected behavior.
>
> OK, I have to agree here that we better leave 1 to be looked at later.
>
> After this change, GetInsertedColumns/GetUpdatedColumns will start
> returning a different set of columns in some cases than it does now.
> Currently, it *always* returns a set containing root table's attribute
> numbers, even for UPDATE. But with this change, for UPDATE, it will start
> returning the set containing the first subplan target rel's attribute
> numbers, which could be any partition with arbitrarily different attribute
> numbers.
>
>> Anyway, I think that
>> the former is more like an improvement rather than a fix, so it would be
>> better to leave that for another patch for PG12?
>
> I agree, so I'm dropping the patch for 1.

OK, let's focus on #2!

> See attached an updated version with changes as described above.

Looks good to me. Thanks for the updated version!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Svedov 2018-04-26 08:40:01 Re: Racing DEADLOCK on PostgreSQL 9.3
Previous Message Amit Langote 2018-04-26 07:43:24 Re: Oddity in tuple routing for foreign partitions