Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Date: 2018-07-04 12:06:11
Message-ID: 5B3CB833.4040005@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2018/07/04 19:04), Ashutosh Bapat wrote:
> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned)
>>> + {
>>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>> + }
>>>
>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>> partition of some other partitioned table. In either case this
>>> condition will be true.
>>
>>
>> This would be needed to avoid unnecessarily applying build_childrel_tlist to
>> child rels of a partitioned table for which we don't consider partitionwise
>> join. Consider:
>>
>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>> CREATE TABLE
>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>> CREATE TABLE
>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>> CREATE TABLE
>> postgres=# create table test (c1 int, c2 text);
>> CREATE TABLE
>> postgres=# explain verbose select * from (select * from lpt union all select
>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>
> --- plan clipped
>
>>
>> In this example, the top parent is not a partitioned table and we don't need
>> to consider partitionwise join for that, so we don't need to apply that to
>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).
>
> FWIW, the test is not sufficient. In the above example, a simple
> select * from lpt inner join test where lpt.c1 = test.c1 would not use
> partition-wise join technique. Why not to avoid build_childrel_tlist()
> in that case as well?

I might misunderstand your words, but in the above example the patch
doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for
that is because we can avoid adjusting the tlists for the corresponding
subplans at plan creation time so that whole-row Vars in the tlists are
transformed into CREs. I think the overhead of the adjustment is not
that big, but not zero, so it would be worth avoiding applying
build_childrel_tlist to partitions if the top parent won't participate
in a partitionwise-join at all.

> Worst we can change the criteria to use
> partition-wise join in future e.g. above case would use partition-wise
> join by 1. merging lpt_p1 into corresponding partition of lpt so that
> ss is partitioned and 2. repartitioning test or joining test with each
> partition of lpt separately. In those cases the changes you are doing
> here need to be reverted and put somewhere else. There's already a
> patch to reverse the order of join and grouping out there. How would
> this work with that.

Interesting, but that seems like a matter of PG12+.

> In general I think set_append_rel_size() or build_simple_rel() is not
> the place where we should decide whether the given relation will
> participate in a PWJ. Also we should not selectively add a
> ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
> We should do it for all the child rels or shouldn't do it at all.

One thing I thought was to apply build_childrel_tlist for all the child
rels whether its top parent is a partitioned table or not, but as I
mentioned above, that would incur needless overhead for adjusting the
tlists for the child rels that don't involve in a partitionwise join
such as lpt_p1 and lpt_p2, which I think is not good.

> An
> in-between state will produce a hell lot of confusion for any further
> optimization. Whenever we change the code around partition-wise
> operations in future, we will have to check whether or not a given
> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
> have mentioned earlier, I am also not comfortable with the targetlist
> of child relations being type inconsistent with that of the parent,
> which is a fundamental rule in inheritance. Worst keep them
> inconsistent during path creation and make them consistent at the time
> of creating plans. A child's whole-row Var has datatype of the child
> where as that of parent has datatype of parent.

I don't see any critical issue here. Could you elaborate a bit more on
that point?

> A ConvertRowtypeExpr
> is used to fix this inconsistency. That's why I chose to use
> pull_var_clause() as a place to fix the problem and not fix
> ConvertRowtypeExpr in targetlist itself.

I think the biggest issue in the original patch you proposed is that we
spend extra cycles where partitioning is not involved, which is the
biggest reason why I think the original patch isn't the right way to go.

> I am also not comfortable in just avoiding ConvertRowtypeExprs in
> targetlist and leave them as is in the conditions and ECs etc. This
> means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
> targetlist will fail at the time of path creation but will succeed at
> the time of plan creation.
>
> This is turning more invasive that my approach of fixing it through PVC.

Sorry, I don't understand this. Could you show me places where the
problem happens?

>>> + /* No work if the child plan is an Append or MergeAppend */
>>> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>>> + return;
>>>
>>> Why? Probably it's related to the answer to the first question, But I
>>> don't see the connection. Given that partition-wise join will be
>>> applicable level by level, we need to recurse in
>>> adjust_subplan_tlist().
>>
>>
>> I don't think so; I think if the child plan is an Append or MergeAppend, the
>> tlist for each subplan of the Append or MergeAppend would have already been
>> adjusted while create_plan_recurse before we are called here.
>
> Ok. Thanks for the clarification. I think we should add a comment.

Will do.

>>> + /* The child plan should be able to do projection */
>>> + Assert(is_projection_capable_plan(subplan));
>>> +
>>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>>> not be projection capable.
>>
>>
>> IIUC, since we are called here right after create_plan_recurse, the child
>> plan would be a scan or join unless it's neither Append nor MergeAppend. So
>> it should be projection-capable. Maybe I'm missing something, though.
>
> That's not true. add_paths_to_append_rel() adds sort paths on scan if
> necessary and creates merge append path.

Really? I think create_merge_append_path called from
generate_mergeappend_paths called from add_paths_to_append_rel uses a
dummy sort path just to compute the cost, but I don't think
create_merge_append_path (or generate_mergeappend_paths or
add_paths_to_append_rel) insert a sort path to a scan (or join) path.

Thanks for the comments!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-07-04 12:08:13 Locking considerations of REINDEX
Previous Message Andrew Dunstan 2018-07-04 11:55:53 Re: [HACKERS] WAL logging problem in 9.4.3?