Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-01-02 08:13:32
Message-ID: CAM2+6=V334YYwQ+9btut8LozO+MRi=_e_Lss78dENMQM5wgOog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

>
> Sure no problem. Take your time. Here's set of comments for 0008. That
> ends the first read of all the patches (2nd reading for the core
> changes)
>
> +-- Also, disable parallel paths.
> +SET max_parallel_workers_per_gather TO 0;
>
> If you enable parallel aggregation for smaller data partition-wise
> aggregation
> paths won't be chosen. I think this is the reason why you are disabling
> parallel query. But we should probably explain that in a comment. Better
> if we
> could come up testcases without disabling parallel query. Since parallel
> append
> is now committed, may be it can help.
>

Removed.

>
> +
> +-- Check with multiple columns in GROUP BY, order in target-list is
> reversed
> +EXPLAIN (COSTS OFF)
> +SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
> + QUERY PLAN
> +-------------------------------------------------
> + Append
> + -> HashAggregate
> + Group Key: pagg_tab_p1.a, pagg_tab_p1.c
> + -> Seq Scan on pagg_tab_p1
> [ ... clipped ... ]
> +(10 rows)
>
> Why do we need this testcase?
>

Rajkumar, earlier reported one issue when order in the target list is
reversed. Fix then required redesigning the GROUP key matching algorithm.
So I think it will be good to have this testcase.

> +
> +SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
> + c | sum
> +---+-----
> +(0 rows)
>
> I think we also need a case when the child input relations are marked
> dummy and
> then the parent is marked dummy. Just use a condition with partkey = <none
> of
> list bounds>.
>

I have added the testcase for that. But don't you think both are same. When
all input children are dummy, parent too marked as dummy, i.e. input
relation is itself dummy.
Am I missing something here?

>
> +
> +-- Check with SORTED paths. Disable hashagg to get group aggregate
>
> Suggest: "Test GroupAggregate paths by disabling hash aggregates."
>
> +-- When GROUP BY clause matches with PARTITION KEY.
>
> I don't think we need "with", and just extend the same sentence with
> "complete
> aggregation is performed for each partition"
>
> +-- Should choose full partition-wise aggregation path
>
> suggest: "Should choose full partition-wise GroupAggregate plan", but I
> guess
> with the above suggestion, this sentence is not needed.
>
> +
> +-- When GROUP BY clause not matches with PARTITION KEY.
> +-- Should choose partial partition-wise aggregation path
>
> Similar suggestions as above.
>
> +-- No aggregates, but still able to perform partition-wise aggregates
>
> That's a funny construction. May be "Test partition-wise grouping without
> any
> aggregates".
>
> We should try some output for this query.
>
> +
> +EXPLAIN (COSTS OFF)
> +SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
> + QUERY PLAN
> +-------------------------------------------------
> + Group
> + Group Key: pagg_tab_p1.a
> + -> Merge Append
> + Sort Key: pagg_tab_p1.a
> + -> Group
> + Group Key: pagg_tab_p1.a
> + -> Sort
> + Sort Key: pagg_tab_p1.a
> + -> Seq Scan on pagg_tab_p1
> [ ... clipped ... ]
> +(19 rows)
>
> It's strange that we do not annotate partial grouping as Partial. Does not
> look
> like a bug in your patch. Do we get similar output with parallel grouping?
>
>
Its partial aggregation only which is finalize at the top.
But since tere are no aggregates involved we create a GROUP path and not an
AGG path. GROUP path has no partial annotations.
Yes, we see similar plan for parallel grouping too.

+
> +-- ORDERED SET within aggregate
> +EXPLAIN (COSTS OFF)
> +SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
> + QUERY PLAN
> +------------------------------------------------------------------------
> + Sort
> + Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
> + -> GroupAggregate
> + Group Key: pagg_tab_p1.a
> + -> Sort
> + Sort Key: pagg_tab_p1.a
> + -> Append
> + -> Seq Scan on pagg_tab_p1
> + -> Seq Scan on pagg_tab_p2
> + -> Seq Scan on pagg_tab_p3
> +(10 rows)
>
> pagg_tab is partitioned by column c. So, not having it in GROUP BY
> itself might produce this plan if Partial parallel aggregation is
> expensive.
> When testing negative tests like this GROUP BY should always have the
> partition
> key.
>

I deliberatly wanted to test when GROUP BY key does not match with the
partition key so that partial aggregation is forced. But then we do have
some limitiation to perform the aggregation in partial i.e. ORDERED SET
cannot be done in partial mode, this is the test to excerisize that code
path.

> In case of full aggregation, since all the rows that belong to the same
> group
> come from the same partition, having an ORDER BY doesn't make any
> difference.
> We should support such a case.
>

We do support this.
Added testcase for it.

>
> +INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
> +INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;
>
> spaces around % operator?
>
> +-- When GROUP BY clause matches with PARTITION KEY.
> +-- Should choose full partition-wise aggregation path.
>
> Probably we should just club single table and join cases under one set of
> comments rather than repeating those? Create the tables once at the
> beginning
> of the test file and group together the queries under one comment head.
>

I think other way round. It will be good to have corresponding
CREATE/INSERTs near the test queries to avoid lengthy scrolls to see the
table structure and data. Each query has a comment to describe what it does.

>
> +-- Disable mergejoin to get hash aggregate.
> +SET enable_mergejoin TO false;
>
> Why? We have tested that once.
>
>
Removed.

> +
> +-- When GROUP BY clause not matches with PARTITION KEY.
> +-- Should choose partial partition-wise aggregation path.
> +-- Also check with SORTED paths. Disable hashagg to get group aggregate.
> +SET enable_hashagg TO false;
>
> Same as above. Two of those clubbed together they will produce one hash
> and one
> group plan. That will cover it.
>

For join queries plan with GroupAgg is not chosen which I wanted to have in
a test-coverage. Thus kept this as is.
We have tested GroupAgg for single partitioned relations though. Let me
know if you think this test is not necessary, I will remove it then.

>
> +-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
> +-- aggregation
> +-- LEFT JOIN, should produce partial partition-wise aggregation plan as
> +-- GROUP BY is on nullable column
> +EXPLAIN (COSTS OFF)
> +SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x =
> b.y GROUP BY 1 ORDER BY 1 NULLS LAST;
>
> May be you should explicitly use GROUP BY b.y in all of these queries.
>

I actually wanted to test GROUP BY n case too. But as you said, in these
queries I have used b.y and modified some other queries to have positional
notation in GROUP BY.

>
> +-- FULL JOIN, should produce partial partition-wise aggregation plan as
> +-- GROUP BY is on nullable column
>
> In case of a FULL JOIN partition keys from the joining relations land on
> nullable side; there is no key on non-nulllable side, so an aggregation on
> top
> of FULL JOIN will always be partial partition-wise aggregation.
>
>
Yep.
Do you want me to add this explanation in the comment? I don't think so.

+
> +-- Empty relations on LEFT side, no partition-wise agg plan.
>
> Suggest: Empty join relation because of empty outer side. I don't think
> we are
> writing a negative test to check whether partition-wise agg plan is not
> chosen.
> We are testing the case when the join relation is empty.
>

I didn't get what exactly you mean here. However updated the comment as per
your suggestion.

>
> +
> +EXPLAIN (COSTS OFF)
> +SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a,
> (a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3;
>
> Keep this or the previous one, both is overkill. I will vote for this one,
> but
> it's upto you.
>

Removed previous one.

>
> May be add a testcase with the partition keys themselves switched; output
> just
> the plan.
>

I don't think we need this, instead modified the earlier one. Please have a
look.

>
> +-- Test with multi-level partitioning scheme
> +-- Partition-wise aggregation is tried only on first level.
> [ ... clipped ... ]
> +-- Full aggregation as GROUP BY clause matches with PARTITION KEY
>
> This seems to contradict with the previous comment. May be club them
> together
> and say "Partition-wise aggregation with full aggregation only at the first
> leve" and move that whole comment down.
>

> +
> +-- Partial aggregation as GROUP BY clause does not match with PARTITION
> KEY
> +EXPLAIN (COSTS OFF)
> +SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
> + QUERY PLAN
> +----------------------------------------------------------------
> + Sort
> + Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
> + -> Finalize GroupAggregate
> + Group Key: pagg_tab_p1.b
> + -> Sort
> + Sort Key: pagg_tab_p1.b
> + -> Append
> + -> Partial HashAggregate
> + Group Key: pagg_tab_p1.b
> + -> Seq Scan on pagg_tab_p1
> + -> Partial HashAggregate
> + Group Key: pagg_tab_p2_s1.b
> + -> Append
> + -> Seq Scan on pagg_tab_p2_s1
> + -> Seq Scan on pagg_tab_p2_s2
> + -> Partial HashAggregate
> + Group Key: pagg_tab_p3_s1.b
> + -> Append
> + -> Seq Scan on pagg_tab_p3_s1
> + -> Seq Scan on pagg_tab_p3_s2
> +(20 rows)
>
> Why aren't we seeing partial aggregation paths for level two and below
> partitions?
>

In this version of the patch I have not recursed into next level.
Will work on it and submit changes in the next patch-set.

>
> +
> +-- Test on middle level partitioned table which is further partitioned on
> b.
> +-- Full aggregation as GROUP BY clause matches with PARTITION KEY
> +EXPLAIN (COSTS OFF)
> +SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
> + QUERY PLAN
> +-------------------------------------------------------------------
> + Sort
> + Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*))
> + -> Append
> + -> HashAggregate
> + Group Key: pagg_tab_p3_s1.b
> + -> Seq Scan on pagg_tab_p3_s1
> + -> HashAggregate
> + Group Key: pagg_tab_p3_s2.b
> + -> Seq Scan on pagg_tab_p3_s2
> +(9 rows)
> +
> +SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
> + b | sum | count
> +---+------+-------
> + 0 | 2000 | 100
> + 1 | 2100 | 100
> + 2 | 2200 | 100
> + 3 | 2300 | 100
> + 4 | 2400 | 100
> + 5 | 2500 | 100
> + 6 | 2600 | 100
> + 7 | 2700 | 100
> + 8 | 2800 | 100
> + 9 | 2900 | 100
> +(10 rows)
>
> We should just remove this case, it's same as testing top-level partitioned
> tables.
>

Removed.

>
> +
> +-- Full aggregation as GROUP BY clause matches with PARTITION KEY
> +EXPLAIN (COSTS OFF)
> +SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP
> BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3;
> + QUERY PLAN
> +-----------------------------------------------------------
> ---------------------------
> + Sort
> + Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT
> pagg_tab_p1.c))
> + -> Append
> + -> GroupAggregate
> + Group Key: pagg_tab_p1.a, pagg_tab_p1.b
> + Filter: (avg(pagg_tab_p1.b) < '3'::numeric)
> + -> Sort
> + Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
> + -> Seq Scan on pagg_tab_p1
> + -> GroupAggregate
> + Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
> + Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric)
> + -> Sort
> + Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
> + -> Append
> + -> Seq Scan on pagg_tab_p2_s1
> + -> Seq Scan on pagg_tab_p2_s2
> + -> GroupAggregate
> + Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
> + Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric)
> + -> Sort
> + Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
> + -> Append
> + -> Seq Scan on pagg_tab_p3_s1
> + -> Seq Scan on pagg_tab_p3_s2
> +(25 rows)
>
> Instead of an Append node appearing under GroupAggregate, I think we should
> flatten all the partition scans for the subpartitions whose partition keys
> are
> part of group keys and add GroupAggregate on top of each of such partition
> scans.
>

Yes. As explained earlier, will do that as a separate patch.

> +-- Parallelism within partition-wise aggregates
> +RESET max_parallel_workers_per_gather;
> +SET min_parallel_table_scan_size TO '8kB';
> +SET parallel_setup_cost TO 0;
> +INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0,
> 29999) i;
>
> spaces around % operator?
>
> +SHOW max_parallel_workers_per_gather;
> + max_parallel_workers_per_gather
> +---------------------------------
> + 2
>
> Why do we need this?
>
>
Removed.

> +
> +-- When GROUP BY clause matches with PARTITION KEY.
> +EXPLAIN (COSTS OFF)
> +SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x
> HAVING avg(y) < 7 ORDER BY 1, 2, 3;
> + QUERY PLAN
> +-----------------------------------------------------------
> ---------------------------
> + Sort
> + Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)),
> (avg(pagg_tab_para_p1.y))
> + -> Append
> [ ... clipped ...]
> + -> Finalize GroupAggregate
> + Group Key: pagg_tab_para_p3.x
> + Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric)
> + -> Sort
> + Sort Key: pagg_tab_para_p3.x
> + -> Gather
> + Workers Planned: 2
> + -> Partial HashAggregate
> + Group Key: pagg_tab_para_p3.x
> + -> Parallel Seq Scan on pagg_tab_para_p3
> [ ... clipped ... ]
> +-- When GROUP BY clause not matches with PARTITION KEY.
> +EXPLAIN (COSTS OFF)
> +SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y
> HAVING avg(x) < 12 ORDER BY 1, 2, 3;
> + QUERY PLAN
> +-----------------------------------------------------------
> ---------------------------
> + Sort
> + Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)),
> (avg(pagg_tab_para_p1.x))
> + -> Finalize HashAggregate
> + Group Key: pagg_tab_para_p1.y
> [ ... clipped ... ]
> + -> Gather
> + Workers Planned: 2
> + -> Partial HashAggregate
> + Group Key: pagg_tab_para_p3.y
> + -> Parallel Seq Scan on pagg_tab_para_p3
>
> Per a prior discussion on this thread, we shouldn't produce such plans;
> Parallel Append instead?
>

Yes. We do get a Parallel Append path now.
For full aggregation, normal Append plan in chosen over Append, but we do
create that.

> +SET enable_partition_wise_agg to true;
>
> May be just enable it at the beginning instead of enabling and disabling
> twice?
>

Done as you said. However, this affected one more testcase from
partition_join.sql. Updated expected output for that too.

Comments for which I have not responded are all done.

All these fixes are part of v9 patchset.

Thanks Ashutosh for detailed reviews so far.

>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-01-02 09:33:57 Re: [HACKERS] wrong t_bits alignment in pageinspect
Previous Message Jeevan Chalke 2018-01-02 07:30:34 Re: [HACKERS] Partition-wise aggregation/grouping