Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Date: 2018-07-06 10:59:54
Message-ID: 5B3F4BAA.3050601@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2018/07/04 21:37), Ashutosh Bapat wrote:
> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>>> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>>>> + if (enable_partitionwise_join&&
>>>>> rel->top_parent_is_partitioned)
>>>>> + {
>>>>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>>>> + }
>>>>>
>>>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>>>> partition of some other partitioned table. In either case this
>>>>> condition will be true.

>>>> This would be needed to avoid unnecessarily applying build_childrel_tlist
>>>> to
>>>> child rels of a partitioned table for which we don't consider
>>>> partitionwise
>>>> join. Consider:
>>>>
>>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>>> CREATE TABLE
>>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>>> CREATE TABLE
>>>> postgres=# create table test (c1 int, c2 text);
>>>> CREATE TABLE
>>>> postgres=# explain verbose select * from (select * from lpt union all
>>>> select
>>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

>> I might misunderstand your words, but in the above example the patch doesn't
>> apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is
>> because we can avoid adjusting the tlists for the corresponding subplans at
>> plan creation time so that whole-row Vars in the tlists are transformed into
>> CREs. I think the overhead of the adjustment is not that big, but not zero,
>> so it would be worth avoiding applying build_childrel_tlist to partitions if
>> the top parent won't participate in a partitionwise-join at all.
>
> I don't think that answers my question. When we join lpt with test,
> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
> when join between lpt and test is not going to use partition-wise
> join. Why?

Maybe my explanation including the example was not good. Sorry about
that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and
lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL
subquery and hence not a partitioned table (ie, we have
rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).

> As per your explanation, the condition "if
> (enable_partitionwise_join&& rel->top_parent_is_partitioned)" is
> used to avoid applying build_childrel_tlist() when partition-wise join
> won't be possible. But it's not covering all the cases.

Let me explain about that: 1) my patch won't apply that function to a
child if its top parent is an appendrel built from a UNION ALL subquery,
even though the child is a partition of a partitioned table pulled up
from a leaf subquery into the parent query, like lpt_p1, and 2) my patch
will apply that function to a child if its top parent is a partitioned
table, whether or not the partitioned table is involved in a
partitionwise join. By #1, we avoid the adjustment work at plan
creation time, as explained above. It might be worth trying to be
smarter about #2 (for example, in the case of a join of a partitioned
table and a non-partitioned table, since we don't consider a
partitionwise join for that join, it's better to not apply that function
to partitions of the partitioned table, to avoid the adjustment work at
plan creation time), but ISTM we don't have enough information to be
smarter.

>>> An
>>> in-between state will produce a hell lot of confusion for any further
>>> optimization. Whenever we change the code around partition-wise
>>> operations in future, we will have to check whether or not a given
>>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
>>> have mentioned earlier, I am also not comfortable with the targetlist
>>> of child relations being type inconsistent with that of the parent,
>>> which is a fundamental rule in inheritance. Worst keep them
>>> inconsistent during path creation and make them consistent at the time
>>> of creating plans. A child's whole-row Var has datatype of the child
>>> where as that of parent has datatype of parent.
>>
>>
>> I don't see any critical issue here. Could you elaborate a bit more on that
>> point?
>
> I think breaking a fundamental rule like this itself is critical. But
> interestingly I am not able to find a case where it becomes a problem.
> But may be I haven't tried enough. And the question is if it's not
> required to have the targetlists type consistent, why even bother with
> ConvertRowtypeExpr addition there? We can use your approach of adding
> ConvertRowtypeExpr at the end in all the cases.

I think that the tlist of a (not-the-topmost) child relation doesn't
need to be type-consistent with that of the parent; it has only to
include all Vars that are needed for higher joinquals and final output
to the parent appendrel. (In other words, I think we can build the
tlist in the same manner as we build tlists of base or join relations in
the main join tree.) On the other hand, other expressions such as WHERE
quals need to be type-consistent with those of the parent; else we would
create a plan that produces the wrong result. So, with my patch we
carry out special handling for the tlist; it includes a child whole-row
Var, if needed, instead of a CRE converting the Var to the parent
rowtype. That allows us to create/process child-join plans using the
existing planner functions as-is.

>>> A ConvertRowtypeExpr
>>> is used to fix this inconsistency. That's why I chose to use
>>> pull_var_clause() as a place to fix the problem and not fix
>>> ConvertRowtypeExpr in targetlist itself.
>>
>>
>> I think the biggest issue in the original patch you proposed is that we
>> spend extra cycles where partitioning is not involved, which is the biggest
>> reason why I think the original patch isn't the right way to go.
>
> When there are no partitions involved, there won't be any
> ConvertRowtypeExprs there which means the function
> is_converted_whole_row_reference() would just return from the first
> line checking IsA() and nullness of node.

Really? IIRC, with the original patch we would spend extra cycles in a
non-partitioned inheritance processing [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5AFC0865.8050802%40lab.ntt.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-07-06 11:01:07 Re: [HACKERS] Crash on promotion when recovery.conf is renamed
Previous Message Pavel Stehule 2018-07-06 09:57:47 Re: How to remove elements from array .