Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-08 17:38:47
Message-ID: CAFjFpRfHkJW3G=_PnSUc6PbXJE48AWYwyRzaGqtfKzzoU4wXXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 8, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2017/09/08 4:04, Robert Haas wrote:
>> On Tue, Sep 5, 2017 at 7:01 AM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> accumulate_append_subpath() is executed for every path instead of
>>> every relation, so changing it would collect the same list multiple
>>> times. Instead, I found the old way of associating all intermediate
>>> partitions with the root partitioned relation work better. Here's the
>>> updated patch set.
>>
>> When I tried out patch 0001, it crashed repeatedly during 'make check'
>> because of an assertion failure in get_partitioned_child_rels. It
>> seemed to me that the way the patch was refactoring
>> expand_inherited_rtentry involved more code rearrangement than
>> necessary, so I reverted all the code rearrangement and just kept the
>> functional changes, and all the crashes went away. (That refactoring
>> also failed to initialize has_child properly.)
>
> When I tried the attached patch, it doesn't seem to expand partitioning
> inheritance in step-wise manner as the patch's title says. I think the
> rewritten patch forgot to include Ashutosh's changes to
> expand_single_inheritance_child() whereby the AppendRelInfo of the child
> will be marked with the direct parent instead of always the root parent.

Right. If we apply 0002 from partition-wise join patchset, which has
changed build_simple_rel() to collect direct children of a given
partitioned table, it introduces another crash because of Assertion
failure; for a partitioned table build_simple_rel() finds more
children than expected because indirect children are also counted as
direct children.

>
> I updated the patch to include just those changes. I'm not sure about
> one of the Ashutosh's changes whereby the child PlanRowMark is also passed
> to expand_partitioned_rtentry() to use as the parent PlanRowMark. I think
> the child RTE, child RT index and child Relation are fine, because they
> are necessary for creating AppendRelInfos in a desired way for later
> planning steps. But PlanRowMarks are not processed within the planner
> afterwards and do not need to be marked with the immediate parent-child
> association in the same way that AppendRelInfos need to be.

Passing top parent's row mark works today, since there is no
parent-child specific translation happening there. But if in future,
we introduce such a translation, row marks for indirect children in a
multi-level partitioned hierarchy won't be accurate. So, I think it's
better to pass row marks of the direct parent.

>
> I also included the changes to add_paths_to_append_rel() from my patch on
> the "path toward faster partition pruning" thread. We'd need that change,
> because while add_paths_to_append_rel() is called for all partitioned
> table RTEs in a given partition tree, expand_inherited_rtentry() would
> have set up a PartitionedChildRelInfo only for the root parent, so
> get_partitioned_child_rels() would not find the same for non-root
> partitioned table rels and crash failing the Assert. The changes I made
> are such that we call get_partitioned_child_rels() only for the parent
> rels that are known to correspond root partitioned tables (or as you
> pointed out on the thread, "the table named in the query" as opposed those
> added to the query as result of inheritance expansion). In addition to
> the relkind check on the input RTE, it would seem that checking that the
> reloptkind is RELOPT_BASEREL would be enough. But actually not, because
> if a partitioned table is accessed in a UNION ALL query, reloptkind even
> for the root partitioned table (the table named in the query) would be
> RELOPT_OTHER_MEMBER_REL. The only way to confirm that the input rel is
> actually the root partitioned table is to check whether its parent rel is
> not RTE_RELATION, because the parent in case of UNION ALL Append is a
> RTE_SUBQUERY RT entry.
>

There was a change in my 0003 patch, which fixed the crash. It checked
for RELOPT_BASEREL and RELKIND_PARTITIONED_TABLE. I have pulled it in
my 0001 patch. It no more crashes. I tried various queries involving
set operations and bare multi-level partitioned table scan with my
patch, but none of them showed any anomaly. Do you have a testcase
which shows problem with my patch? May be your suggestion is correct,
but corresponding code implementation is slightly longer than I would
expect. So, we should go with it, if there is corresponding testcase
which shows why it's needed.

In your patch
+ parent_rel = root->simple_rel_array[parent_relid];
+ get_pcinfo = (parent_rel->rtekind == RTE_SUBQUERY);
Do you mean RTE_RELATION as you explained above?

>> One thing I notice is that if I rip out the changes to initsplan.c,
>> the new regression test still passes. If it's possible to write a
>> test that fails without those changes, I think it would be a good idea
>> to include one in the patch. That's certainly one of the subtler
>> parts of this patch, IMHO.
>
> Back when this (step-wise expansion of partition inheritance) used to be a
> patch in the original declarative partitioning patch series, Ashutosh had
> reported a test query [1] that would fail getting a plan, for which we
> came up with the initsplan.c changes in this patch as the solution:
>
> ERROR: could not devise a query plan for the given query
>
> I tried that query again without the initsplan.c changes and somehow the
> same error does not occur anymore. It's strange because without the
> initsplan.c changes, there is no way for partitions lower in the tree than
> the first level to get the direct_lateral_relids and lateral_relids from
> the root parent rel. Maybe, Ashutosh has a way to devise the failing
> query again.
>

Thanks a lot for the reference. I devised a testcase slightly
modifying my original test. I have included the test in the latest
patch set.

I have included Robert's changes to parts other than
expand_inherited_rtentry() in the patch.

>
> I also confirmed that the partition-pruning patch set works fine with this
> patch instead of the patch on that thread with the same functionality,
> which I will now drop from that patch set. Sorry about the wasted time.
>

Thanks a lot. Please review the patch in the updated patchset.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v30.tar.gz application/x-gzip 166.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-08 17:41:19 Re: [bug fix] Savepoint-related statements terminates connection
Previous Message Alexander Kuzmenkov 2017-09-08 17:37:29 Re: [POC] Faster processing at Gather node