Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Date: 2018-07-04 10:04:46
Message-ID: CAFjFpRf55MT-9NnaRwp0wkDUTuwb++UA_un9Twai=qNVePh-TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>
>> I have started reviewing the patch.
>
>
> Thanks for the review!
>
>> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned)
>> + {
>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>> + }
>>
>> Why do we need rel->top_parent_is_partitioned? If a relation is
>> partitioned (if (rel->part_scheme), it's either the top parent or is
>> partition of some other partitioned table. In either case this
>> condition will be true.
>
>
> This would be needed to avoid unnecessarily applying build_childrel_tlist to
> child rels of a partitioned table for which we don't consider partitionwise
> join. Consider:
>
> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
> CREATE TABLE
> postgres=# create table lpt_p1 partition of lpt for values in (1);
> CREATE TABLE
> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
> CREATE TABLE
> postgres=# create table test (c1 int, c2 text);
> CREATE TABLE
> postgres=# explain verbose select * from (select * from lpt union all select
> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

--- plan clipped

>
> In this example, the top parent is not a partitioned table and we don't need
> to consider partitionwise join for that, so we don't need to apply that to
> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).

FWIW, the test is not sufficient. In the above example, a simple
select * from lpt inner join test where lpt.c1 = test.c1 would not use
partition-wise join technique. Why not to avoid build_childrel_tlist()
in that case as well? Worst we can change the criteria to use
partition-wise join in future e.g. above case would use partition-wise
join by 1. merging lpt_p1 into corresponding partition of lpt so that
ss is partitioned and 2. repartitioning test or joining test with each
partition of lpt separately. In those cases the changes you are doing
here need to be reverted and put somewhere else. There's already a
patch to reverse the order of join and grouping out there. How would
this work with that.

In general I think set_append_rel_size() or build_simple_rel() is not
the place where we should decide whether the given relation will
participate in a PWJ. Also we should not selectively add a
ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
We should do it for all the child rels or shouldn't do it at all. An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent. A ConvertRowtypeExpr
is used to fix this inconsistency. That's why I chose to use
pull_var_clause() as a place to fix the problem and not fix
ConvertRowtypeExpr in targetlist itself.

I am also not comfortable in just avoiding ConvertRowtypeExprs in
targetlist and leave them as is in the conditions and ECs etc. This
means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
targetlist will fail at the time of path creation but will succeed at
the time of plan creation.

This is turning more invasive that my approach of fixing it through PVC.

>
>> + /* No work if the child plan is an Append or MergeAppend */
>> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>> + return;
>>
>> Why? Probably it's related to the answer to the first question, But I
>> don't see the connection. Given that partition-wise join will be
>> applicable level by level, we need to recurse in
>> adjust_subplan_tlist().
>
>
> I don't think so; I think if the child plan is an Append or MergeAppend, the
> tlist for each subplan of the Append or MergeAppend would have already been
> adjusted while create_plan_recurse before we are called here.

Ok. Thanks for the clarification. I think we should add a comment.

>
>> + /* The child plan should be able to do projection */
>> + Assert(is_projection_capable_plan(subplan));
>> +
>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>> not be projection capable.
>
>
> IIUC, since we are called here right after create_plan_recurse, the child
> plan would be a scan or join unless it's neither Append nor MergeAppend. So
> it should be projection-capable. Maybe I'm missing something, though.

That's not true. add_paths_to_append_rel() adds sort paths on scan if
necessary and creates merge append path.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-07-04 10:06:43 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Previous Message Pavel Raiskup 2018-07-04 09:54:22 [PATCH] btree_gist: fix union implementation for variable length columns