Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-14 12:06:58
Message-ID: CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 14, 2017 at 4:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Sep 13, 2017 at 12:56 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> I debugged what happens in case of query "select 1 from t1 union all
>> select 2 from t1;" with the current HEAD (without multi-level
>> expansion patch attached). It doesn't set partitioned_rels in Append
>> path that gets converted into Append plan. Remember t1 is a
>> multi-level partitioned table here with t1p1 as its immediate
>> partition and t1p1p1 as partition of t1p1. So, the
>> set_append_rel_pathlist() recurses once as shown in the following
>> stack trace.
>
> Nice debugging. I spent some time today looking at this and I think
> it's a bug in v10, and specifically in add_paths_to_append_rel(),
> which only sets partitioned_rels correctly when the appendrel is a
> partitioned rel, and not when it's a subquery RTE with one or more
> partitioned queries beneath it.
>
> Attached are two patches either one of which will fix it. First, I
> wrote mechanical-partrels-fix.patch, which just mechanically
> propagates partitioned_rels lists from accumulated subpaths into the
> list used to construct the parent (Merge)AppendPath. I wasn't entire
> happy with that, because it ends up building multiple partitioned_rels
> lists for the same RelOptInfo. That seems silly, but there's no
> principled way to avoid it; avoiding it amounts to hoping that all the
> paths for the same relation carry the same partitioned_rels list,
> which is uncomfortable.
>
> So then I wrote pcinfo-for-subquery.patch. That patch notices when an
> RTE_SUBQUERY appendrel is processed and accumulates the
> partitioned_rels of its immediate children; in case there can be
> multiple nested levels of subqueries before we get down to the actual
> partitioned rel, it also adds a PartitionedChildRelInfo for the
> subquery RTE, so that there's no need to walk the whole tree to build
> the partitioned_rels list at higher levels, just the immediate
> children. I find this fix a lot more satisfying. It adds less code
> and does no extra work in the common case.

Thanks a lot for the patch. I have included pcinfo-for-subquery.patch
in my patchset as the first patch with typo corrections suggested by
Amit Langote.

>
> Notice that the choice of fix we adopt has consequences for your
> 0001-Multi-level-partitioned-table-expansion.patch -- with
> mechanical-partrels-fix.patch, that patch could either associated all
> partitioned_rels with the top-parent or it could work level by level
> and everything would get properly assembled later. But with
> pcinfo-for-subquery.patch, we need everything associated with the
> top-parent. That doesn't seem like a problem to me, but it's
> something to note.
>

I have few changes to multi-level expansion patch as per discussion in
earlier mails
1. expand_single_inheritance_child() gets the top parent's PlanRowMark
from which it builds the child's PlanRowMark and also update
allMarkTypes of the top parent's PlanRowMark. The chlid's PlanRowMark
contains the RTI of the top parent, which is pulled from the top
parent's PlanRowMark. This is to keep the old behaviour intact.

2. Updated expand_single_inheritance_child's prologue to explain
various output arguments, per suggestion from Amit Langote. Also
included comments about the way we construct child PlanRowMark. Please
see if the comments look good.

3. As suggested by Amit Langote, with multi-level partitioned table
expansion, intermediate partitioned tables won't have pcinfo
associated them. So, that patch removes the assertion
Assert(list_length(partitioned_rels) >= 1) in
add_paths_to_append_rels(). I didn't remove that assertion from your
patch so that you could cherry-pick that commit to v10 where that
assertion holds true.

4. Fixed inheritance_planner() to use top parent's RTE to pull
partitioned_rels per discussion with Amit few mails back [1].

Please let me know if I have missed anything; it's been some long discussion.

Apart from this I have included fix to reparameterize parallel nested
loop paths as per discussion in [2].

Please note that I have removed the advanced partitioning patches from
the attached patchset since those need a rebase because of default
partition support.

[1] https://www.postgresql.org/message-id/CAFjFpRe62H0rTb4Rb7wOVSR25xfNW+mt1Ncp-OtzGaEtZBTLwA@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJ3gD9ctVgv6r0-7B6js7Z5uPHXx+KA5jK-3=uFsGwKOXfTddg@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_dp_join_patches_v31.tar.gz application/x-gzip 135.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-09-14 12:23:39 Re: DROP SUBSCRIPTION hangs if sub is disabled in the same transaction
Previous Message Ashutosh Sharma 2017-09-14 11:57:27 Re: [PATCH] pageinspect function to decode infomasks