Re: [HACKERS] Partition-wise aggregation/grouping

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2017-12-14 10:31:01
Message-ID: CAFjFpReYBgToHrYwMxbJMb=MoWgD=iVAALnXZ3p0JafBkx61OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2017 at 6:37 PM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>
>
> On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> Here are review comments for 0009
>
>
> Thank you, Ashutosh for the detailed review so far.
>
> I am working on your reviews but since parallel Append is now committed,
> I need to re-base my changes over it and need to resolve the conflicts too.
>
> Once done, I will submit the new patch-set fixing these and earlier review
> comments.
>

Sure no problem. Take your time. Here's set of comments for 0008. That
ends the first read of all the patches (2nd reading for the core
changes)

+-- Also, disable parallel paths.
+SET max_parallel_workers_per_gather TO 0;

If you enable parallel aggregation for smaller data partition-wise aggregation
paths won't be chosen. I think this is the reason why you are disabling
parallel query. But we should probably explain that in a comment. Better if we
could come up testcases without disabling parallel query. Since parallel append
is now committed, may be it can help.

+
+-- Check with multiple columns in GROUP BY, order in target-list is reversed
+EXPLAIN (COSTS OFF)
+SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> HashAggregate
+ Group Key: pagg_tab_p1.a, pagg_tab_p1.c
+ -> Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(10 rows)

Why do we need this testcase?

+
+-- Test when input relation for grouping is dummy
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ QUERY PLAN
+--------------------------------
+ HashAggregate
+ Group Key: pagg_tab.c
+ -> Result
+ One-Time Filter: false
+(4 rows)

Not part of your patch, I am wondering if we can further optimize this plan by
converting HashAggregate to Result (One-time Filter: false) and the aggregate
target. Just an idea.

+
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ c | sum
+---+-----
+(0 rows)

I think we also need a case when the child input relations are marked dummy and
then the parent is marked dummy. Just use a condition with partkey = <none of
list bounds>.

+
+-- Check with SORTED paths. Disable hashagg to get group aggregate

Suggest: "Test GroupAggregate paths by disabling hash aggregates."

+-- When GROUP BY clause matches with PARTITION KEY.

I don't think we need "with", and just extend the same sentence with "complete
aggregation is performed for each partition"

+-- Should choose full partition-wise aggregation path

suggest: "Should choose full partition-wise GroupAggregate plan", but I guess
with the above suggestion, this sentence is not needed.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path

Similar suggestions as above.

+-- No aggregates, but still able to perform partition-wise aggregates

That's a funny construction. May be "Test partition-wise grouping without any
aggregates".

We should try some output for this query.

+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+ QUERY PLAN
+-------------------------------------------------
+ Group
+ Group Key: pagg_tab_p1.a
+ -> Merge Append
+ Sort Key: pagg_tab_p1.a
+ -> Group
+ Group Key: pagg_tab_p1.a
+ -> Sort
+ Sort Key: pagg_tab_p1.a
+ -> Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)

It's strange that we do not annotate partial grouping as Partial. Does not look
like a bug in your patch. Do we get similar output with parallel grouping?

+
+-- ORDERED SET within aggregate
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
+ -> GroupAggregate
+ Group Key: pagg_tab_p1.a
+ -> Sort
+ Sort Key: pagg_tab_p1.a
+ -> Append
+ -> Seq Scan on pagg_tab_p1
+ -> Seq Scan on pagg_tab_p2
+ -> Seq Scan on pagg_tab_p3
+(10 rows)

pagg_tab is partitioned by column c. So, not having it in GROUP BY
itself might produce this plan if Partial parallel aggregation is expensive.
When testing negative tests like this GROUP BY should always have the partition
key.

In case of full aggregation, since all the rows that belong to the same group
come from the same partition, having an ORDER BY doesn't make any difference.
We should support such a case.

+INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
+INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;

spaces around % operator?

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Should choose full partition-wise aggregation path.

Probably we should just club single table and join cases under one set of
comments rather than repeating those? Create the tables once at the beginning
of the test file and group together the queries under one comment head.

+-- Disable mergejoin to get hash aggregate.
+SET enable_mergejoin TO false;

Why? We have tested that once.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path.
+-- Also check with SORTED paths. Disable hashagg to get group aggregate.
+SET enable_hashagg TO false;

Same as above. Two of those clubbed together they will produce one hash and one
group plan. That will cover it.

+-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
+-- aggregation
+-- LEFT JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column
+EXPLAIN (COSTS OFF)
+SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x =
b.y GROUP BY 1 ORDER BY 1 NULLS LAST;

May be you should explicitly use GROUP BY b.y in all of these queries.

+-- FULL JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column

In case of a FULL JOIN partition keys from the joining relations land on
nullable side; there is no key on non-nulllable side, so an aggregation on top
of FULL JOIN will always be partial partition-wise aggregation.

+
+-- Empty relations on LEFT side, no partition-wise agg plan.

Suggest: Empty join relation because of empty outer side. I don't think we are
writing a negative test to check whether partition-wise agg plan is not chosen.
We are testing the case when the join relation is empty.

+
+EXPLAIN (COSTS OFF)
+SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a,
(a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3;

Keep this or the previous one, both is overkill. I will vote for this one, but
it's upto you.

May be add a testcase with the partition keys themselves switched; output just
the plan.

+-- Test with multi-level partitioning scheme
+-- Partition-wise aggregation is tried only on first level.
[ ... clipped ... ]
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY

This seems to contradict with the previous comment. May be club them together
and say "Partition-wise aggregation with full aggregation only at the first
leve" and move that whole comment down.

+
+-- Partial aggregation as GROUP BY clause does not match with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
+ QUERY PLAN
+----------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
+ -> Finalize GroupAggregate
+ Group Key: pagg_tab_p1.b
+ -> Sort
+ Sort Key: pagg_tab_p1.b
+ -> Append
+ -> Partial HashAggregate
+ Group Key: pagg_tab_p1.b
+ -> Seq Scan on pagg_tab_p1
+ -> Partial HashAggregate
+ Group Key: pagg_tab_p2_s1.b
+ -> Append
+ -> Seq Scan on pagg_tab_p2_s1
+ -> Seq Scan on pagg_tab_p2_s2
+ -> Partial HashAggregate
+ Group Key: pagg_tab_p3_s1.b
+ -> Append
+ -> Seq Scan on pagg_tab_p3_s1
+ -> Seq Scan on pagg_tab_p3_s2
+(20 rows)

Why aren't we seeing partial aggregation paths for level two and below
partitions?

+
+-- Test on middle level partitioned table which is further partitioned on b.
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+ QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*))
+ -> Append
+ -> HashAggregate
+ Group Key: pagg_tab_p3_s1.b
+ -> Seq Scan on pagg_tab_p3_s1
+ -> HashAggregate
+ Group Key: pagg_tab_p3_s2.b
+ -> Seq Scan on pagg_tab_p3_s2
+(9 rows)
+
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+ b | sum | count
+---+------+-------
+ 0 | 2000 | 100
+ 1 | 2100 | 100
+ 2 | 2200 | 100
+ 3 | 2300 | 100
+ 4 | 2400 | 100
+ 5 | 2500 | 100
+ 6 | 2600 | 100
+ 7 | 2700 | 100
+ 8 | 2800 | 100
+ 9 | 2900 | 100
+(10 rows)

We should just remove this case, it's same as testing top-level partitioned
tables.

+
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP
BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3;
+ QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT
pagg_tab_p1.c))
+ -> Append
+ -> GroupAggregate
+ Group Key: pagg_tab_p1.a, pagg_tab_p1.b
+ Filter: (avg(pagg_tab_p1.b) < '3'::numeric)
+ -> Sort
+ Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
+ -> Seq Scan on pagg_tab_p1
+ -> GroupAggregate
+ Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+ Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric)
+ -> Sort
+ Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+ -> Append
+ -> Seq Scan on pagg_tab_p2_s1
+ -> Seq Scan on pagg_tab_p2_s2
+ -> GroupAggregate
+ Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+ Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric)
+ -> Sort
+ Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+ -> Append
+ -> Seq Scan on pagg_tab_p3_s1
+ -> Seq Scan on pagg_tab_p3_s2
+(25 rows)

Instead of an Append node appearing under GroupAggregate, I think we should
flatten all the partition scans for the subpartitions whose partition keys are
part of group keys and add GroupAggregate on top of each of such partition
scans.

+-- Parallelism within partition-wise aggregates
+RESET max_parallel_workers_per_gather;
+SET min_parallel_table_scan_size TO '8kB';
+SET parallel_setup_cost TO 0;
+INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0, 29999) i;

spaces around % operator?

+SHOW max_parallel_workers_per_gather;
+ max_parallel_workers_per_gather
+---------------------------------
+ 2

Why do we need this?

+
+-- When GROUP BY clause matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x
HAVING avg(y) < 7 ORDER BY 1, 2, 3;
+ QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)),
(avg(pagg_tab_para_p1.y))
+ -> Append
[ ... clipped ...]
+ -> Finalize GroupAggregate
+ Group Key: pagg_tab_para_p3.x
+ Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric)
+ -> Sort
+ Sort Key: pagg_tab_para_p3.x
+ -> Gather
+ Workers Planned: 2
+ -> Partial HashAggregate
+ Group Key: pagg_tab_para_p3.x
+ -> Parallel Seq Scan on pagg_tab_para_p3
[ ... clipped ... ]
+-- When GROUP BY clause not matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y
HAVING avg(x) < 12 ORDER BY 1, 2, 3;
+ QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+ Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)),
(avg(pagg_tab_para_p1.x))
+ -> Finalize HashAggregate
+ Group Key: pagg_tab_para_p1.y
[ ... clipped ... ]
+ -> Gather
+ Workers Planned: 2
+ -> Partial HashAggregate
+ Group Key: pagg_tab_para_p3.y
+ -> Parallel Seq Scan on pagg_tab_para_p3

Per a prior discussion on this thread, we shouldn't produce such plans;
Parallel Append instead?

+SET enable_partition_wise_agg to true;

May be just enable it at the beginning instead of enabling and disabling twice?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2017-12-14 10:36:47 Re: CUBE seems a bit confused about ORDER BY
Previous Message Fabien COELHO 2017-12-14 09:41:04 Re: pgbench's expression parsing & negative numbers