Re: [HACKERS] Partition-wise aggregation/grouping

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-02-13 07:07:58
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Thu, Feb 8, 2018 at 6:35 PM, Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

> Hi,
> In this attached version, I have rebased my changes over new design of
> partially_grouped_rel. The preparatory changes of adding
> partially_grouped_rel are in 0001.
> Also to minimize finalization code duplication, I have refactored them
> into two separate functions, finalize_sorted_partial_agg_path() and
> finalize_hashed_partial_agg_path(). I need to create these two functions
> as current path creation order in like,
> Sort Agg Path
> Sort Agg Path - Parallel Aware (Finalization needed here)
> Hash Agg Path
> Hash Agg Path - Parallel Aware (Finalization needed here)
> And if we club those finalizations together, then path creation order will
> be changed and it may result in the existing plan changes.
> Let me know if that's OK, I will merge them together as they are distinct
> anyways. These changes are part of 0002.
> 0003 - 0006 are refactoring patches as before.
> 0007 is the main patch per new design. I have removed
> create_partition_agg_paths() altogether as finalization code is reused.
> Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a
> partial path from nested level if the parent is doing a partial
> aggregation. add_single_path_to_append_rel() is no more exists and also
> there is no need to pass OtherUpperPathExtraData to
> add_paths_to_append_rel().
> 0008 - 0009, testcase and postgres_fdw changes.
> Please have a look at new changes and let me know if I missed any.
> Thanks

I was testing this patch for TPC-H benchmarking and came across following

Q1 completes in 229 secs with patch and in 66 secs without it. It looks
like with this patch the time of parallel seq scan itself is elevated for
some of the partitions. Notice for partitions, lineitem_3, lineitem_7,
lineitem_10, and linietem_5 it is some 13 secs which was somewhere around 5
secs on head.

Q6 completes in some 7 secs with patch and it takes 4 secs without it. This
is mainly caused because with the new parallel append, the parallel
operator below it (parallel index scan in this case) is not used, however,
on head it was the append of all the parallel index scans, which was saving
quite some time.

Q18 takes some 390 secs with patch and some 147 secs without it.

The experimental setup for these tests is as follows,
work_mem = 500MB
shared_buffers = 10GB
effective_cache_size = 4GB
seq_page_cost = random+page_cost = 0.01
enable_partition_wise_join = off

Partitioning info:
Total 10 partitions on tables - lineitem and orders each with partitioning
key being l_orderkey and o_orderkey respectively.

Please find the attached file for explain analyse outputs of each of the
reported query.
Rafia Sabih

Attachment Content-Type Size application/zip 39.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2018-02-13 07:11:26 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Previous Message Thomas Munro 2018-02-13 06:47:25 Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index