Re: Partition-wise join for join between (declaratively) partitioned tables

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-15 08:39:29
Message-ID: CAOGQiiO4+ez-MKtcKiTUOqbN8_TFLotto9sNZEjfATVi+0FVSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 14, 2017 at 8:27 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2017/09/14 7:43, Robert Haas wrote:
>> On Wed, Sep 13, 2017 at 12:56 PM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> I debugged what happens in case of query "select 1 from t1 union all
>>> select 2 from t1;" with the current HEAD (without multi-level
>>> expansion patch attached). It doesn't set partitioned_rels in Append
>>> path that gets converted into Append plan. Remember t1 is a
>>> multi-level partitioned table here with t1p1 as its immediate
>>> partition and t1p1p1 as partition of t1p1. So, the
>>> set_append_rel_pathlist() recurses once as shown in the following
>>> stack trace.
>>
>> Nice debugging.
>
> +1.
>
>> I spent some time today looking at this and I think
>> it's a bug in v10, and specifically in add_paths_to_append_rel(),
>> which only sets partitioned_rels correctly when the appendrel is a
>> partitioned rel, and not when it's a subquery RTE with one or more
>> partitioned queries beneath it.
>>
>> Attached are two patches either one of which will fix it. First, I
>> wrote mechanical-partrels-fix.patch, which just mechanically
>> propagates partitioned_rels lists from accumulated subpaths into the
>> list used to construct the parent (Merge)AppendPath. I wasn't entire
>> happy with that, because it ends up building multiple partitioned_rels
>> lists for the same RelOptInfo. That seems silly, but there's no
>> principled way to avoid it; avoiding it amounts to hoping that all the
>> paths for the same relation carry the same partitioned_rels list,
>> which is uncomfortable.
>>
>> So then I wrote pcinfo-for-subquery.patch. That patch notices when an
>> RTE_SUBQUERY appendrel is processed and accumulates the
>> partitioned_rels of its immediate children; in case there can be
>> multiple nested levels of subqueries before we get down to the actual
>> partitioned rel, it also adds a PartitionedChildRelInfo for the
>> subquery RTE, so that there's no need to walk the whole tree to build
>> the partitioned_rels list at higher levels, just the immediate
>> children. I find this fix a lot more satisfying. It adds less code
>> and does no extra work in the common case.
>
> I very much like pcinfo-for-subquery.patch, although I'm not sure if we
> need to create PartitionedChildRelInfo for the sub-query parent RTE as the
> patch teaches add_paths_to_append_rel() to do. ISTM, nested UNION ALL
> subqueries are flattened way before we get to add_paths_to_append_rel();
> if it could not be flattened, there wouldn't be a call to
> add_paths_to_append_rel() in the first place, because no AppendRelInfos
> would be generated. See what happens when is_simple_union_all_recurse()
> returns false to flatten_simple_union_all() -- no AppendRelInfos will be
> generated and added to root->append_rel_list in that case.
>
> IOW, there won't be nested AppendRelInfos for nested UNION ALL sub-queries
> like we're setting out to build for multi-level partitioned tables.
>
> So, as things stand today, there can at most be one recursive call of
> add_path_to_append_rel() for a sub-query parent RTE, that is, if its child
> sub-queries contain partitioned tables, but not more. The other patch
> (multi-level expansion of partitioned tables) will change that, but even
> then we won't need sub-query's own PartitioendChildRelInfo.
>
>> Notice that the choice of fix we adopt has consequences for your
>> 0001-Multi-level-partitioned-table-expansion.patch -- with
>> mechanical-partrels-fix.patch, that patch could either associated all
>> partitioned_rels with the top-parent or it could work level by level
>> and everything would get properly assembled later. But with
>> pcinfo-for-subquery.patch, we need everything associated with the
>> top-parent. That doesn't seem like a problem to me, but it's
>> something to note.
>
> I think it's fine.
>
> With 0001-Multi-level-partitioned-table-expansion.patch,
> get_partitioned_child_rels() will get called even for non-root partitioned
> tables, for which it won't find a valid pcinfo. I think that patch must
> also change its callers to stop Asserting that a valid pcinfo is returned.
>
> Spotted a typo in pcinfo-for-subquery.patch:
>
> + * A plain relation will alread have
>
> Thanks,
> Amit
>
On TPC-H benchmarking of this patch, I found a regression in Q7. It
was taking some 1500s with the patch and some 900s without the patch.
Please find the attached pwd_reg.zip for the output of explain analyse
on head and with patch.

The experimental settings used were,
commit-id = 0c504a80cf2e6f66df2cdea563e879bf4abd1629
patch-version = v26

Server settings:
work_mem = 1GB
shared_buffers = 10GB
effective_cache_size = 10GB
max_parallel_workers_per_gather = 4

Partitioning information:
Partitioning scheme = by range
Number of partitions in lineitem and orders table = 106
partition key for lineitem = l_orderkey
partition key for orders = o_orderkey

Apart from these there is a regression case on a custom table, on head
query completes in 20s and with this patch it takes 27s. Please find
the attached .out and .sql file for the output and schema for the test
case respectively. I have reported this case before (sometime around
March this year) as well, but I am not sure if it was overlooked or is
an unimportant and expected behaviour for some reason.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
pwj_reg.out application/octet-stream 31.7 KB
test_case_pwj.sql application/octet-stream 17.6 KB
pwj_reg.zip application/zip 81.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-09-15 08:43:39 Re: [PATCH] Improve geometric types
Previous Message Kyotaro HORIGUCHI 2017-09-15 08:31:52 Re: [PATCH] Improve geometric types