Quick Links

Re: Parallel grouping sets

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Pengzhou Tang <ptang(at)pivotal(dot)io>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Jesse Zhang <sbjesse(at)gmail(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject:	Re: Parallel grouping sets
Date:	2020-03-24 03:13:09
Message-ID:	20200324031309.nryb2c6reecgeh4t@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 20, 2020 at 07:57:02PM +0800, Pengzhou Tang wrote:
>Hi Tomas,
>
>I rebased the code and resolved the comments you attached, some unresolved
>comments are explained in 0002-fixes.patch, please take a look.
>
>I also make the hash spill working for parallel grouping sets, the plan
>looks like:
>
>gpadmin=# explain select g100, g10, sum(g::numeric), count(*), max(g::text)
>from gstest_p group by cube (g100,g10);
> QUERY PLAN
>-------------------------------------------------------------------------------------------
> Finalize MixedAggregate (cost=1000.00..7639.95 rows=1111 width=80)
> Filtered by: (GROUPINGSETID())
> Group Key: ()
> Hash Key: g100, g10
> Hash Key: g100
> Hash Key: g10
> Planned Partitions: 4
> -> Gather (cost=1000.00..6554.34 rows=7777 width=84)
> Workers Planned: 7
> -> Partial MixedAggregate (cost=0.00..4776.64 rows=1111 width=84)
> Group Key: ()
> Hash Key: g100, g10
> Hash Key: g100
> Hash Key: g10
> Planned Partitions: 4
> -> Parallel Seq Scan on gstest_p (cost=0.00..1367.71
>rows=28571 width=12)
>(16 rows)
>

Hmmm, OK. I think there's some sort of memory leak, though. I've tried
running a simple grouping set query on catalog_sales table from TPC-DS
scale 100GB test. The query is pretty simple:

select count(*) from catalog_sales
group by cube (cs_warehouse_sk, cs_ship_mode_sk, cs_call_center_sk);

with a partial MixedAggregate plan (attached). When executed, it however
allocates more and more memory, and eventually gets killed by an OOM
killer. This is on a machine with 8GB of RAM, work_mem=4MB (and 4
parallel workers).

The memory context stats from a running process before it gets killed by
OOM look like this

TopMemoryContext: 101560 total in 6 blocks; 7336 free (6 chunks); 94224 used
TopTransactionContext: 73816 total in 4 blocks; 11624 free (0 chunks); 62192 used
ExecutorState: 1375731712 total in 174 blocks; 5391392 free (382 chunks); 1370340320 used
HashAgg meta context: 315784 total in 10 blocks; 15400 free (2 chunks); 300384 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
...

That's 1.3GB allocated in ExecutorState - that doesn't seem right.

FWIW there are only very few groups (each attribute has fewer than 30
distinct values, so there's only about ~1000 groups. On master it works
just fine, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
plan.txt	text/plain	1.3 KB
stats.txt	text/plain	4.2 KB

In response to

Re: Parallel grouping sets at 2020-03-20 11:57:02 from Pengzhou Tang

Responses

Re: Parallel grouping sets at 2020-03-25 14:35:32 from Pengzhou Tang

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	James Coleman	2020-03-24 03:13:59	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message	Thomas Munro	2020-03-24 03:04:56	Re: weird hash plan cost, starting with pg10