Quick Links

Re: Parallel Aggregate

From:	James Sewell <james(dot)sewell(at)lisasoft(dot)com>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Aggregate
Date:	2016-03-14 04:05:43
Message-ID:	CANkGpBsRLfGhvnbCRqHYJv_G=hAL6B9iui6kV+1xqFECwo+HwA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi again,

I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:

/*
* Decide what parallel degree to request for this append path. For
* now, we just use the maximum parallel degree of any member. It
* might be useful to use a higher number if the Append node were
* smart enough to spread out the workers, but it currently isn't.
*/

Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?

Cheers,

James Sewell,
Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

On Mon, Mar 14, 2016 at 2:39 PM, James Sewell <james(dot)sewell(at)lisasoft(dot)com>
wrote:

> Cool,
>
> I've been testing how this works with partitioning (which seems to be
> strange, but I'll post separately about that) and something odd seems to be
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 20000000
>
> postgres=# select * from a limit 1;
> ts | count | a | b | c | d | e
> ----------------------------+-------+-----+------+------+------+---
> 2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 |
> (1 row)
>
> postgres-# \d a
> Table "datamart_owner.a"
> Column | Type | Modifiers
> --------+-----------------------------+-----------
> ts | timestamp without time zone |
> count | integer |
> a | integer |
> b | integer |
> c | integer |
> d | integer |
> e | integer |
>
> postgres=# select pg_size_pretty(pg_relation_size('a'));
> pg_size_pretty
> ----------------
> 1149 MB
>
> postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------
> Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)
> Group Key: (date_trunc('DAY'::text, ts))
> -> Sort (cost=218242.96..218245.96 rows=1200 width=16)
> Sort Key: (date_trunc('DAY'::text, ts))
> -> Gather (cost=218059.08..218181.58 rows=1200 width=16)
> Number of Workers: 5
> -> Partial HashAggregate (cost=217059.08..217061.58
> rows=200 width=16)
> Group Key: date_trunc('DAY'::text, ts)
> -> Parallel Seq Scan on a (cost=0.00..197059.06
> rows=4000005 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --------------------------------------------------------------------------
> GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)
> Group Key: (date_trunc('DAY'::text, ts))
> -> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)
> Sort Key: (date_trunc('DAY'::text, ts))
> -> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)
> (5 rows)
>
> Unsure what's happening here.
>
>
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> ______________________________________
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099
>
>
> On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
> david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
>> On 14 March 2016 at 14:52, James Sewell <james(dot)sewell(at)lisasoft(dot)com>
>> wrote:
>> > One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>>
>> There is also a comment in that function which states:
>> /*
>> * Limit the degree of parallelism logarithmically based on the size of the
>> * relation. This probably needs to be a good deal more sophisticated,
>> but we
>> * need something here for now.
>> */
>>
>> So this will likely see some revision at some point, after 9.6.
>>
>> --
>> David Rowley http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

In response to

Re: Parallel Aggregate at 2016-03-14 03:39:04 from James Sewell

Responses

Re: Parallel Aggregate at 2016-03-14 04:20:53 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2016-03-14 04:06:21	Re: pg_stat_get_progress_info(NULL) blows up
Previous Message	David Rowley	2016-03-14 04:05:35	Re: Parallel Aggregate