Quick Links

Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Олег Царев <zabivator(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date:	2009-05-12 21:20:14
Message-ID:	162867790905121420p7c910054x24d8e327abd58cab@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Oleg

I am sending a new CTE based variant of my GROUPING SETS patch,

this patch has some bugs but it is good prototype (it's more stable
than old patch):

postgres=# select selling_date, baguette, sum(items) from
baguette_selling group by grouping sets(1,2);
selling_date | baguette | sum
--------------+----------+-----
2007-10-30 | | 17
2007-10-31 | | 12
| golf | 9
| buster | 20
(4 rows)

postgres=# select selling_date, baguette, sum(items),
grouping(selling_date), grouping(baguette), grouping_id(selling_date,
baguette) from baguette_selling group by grouping sets(1,2);
selling_date | baguette | sum | grouping | grouping | grouping_id
--------------+----------+-----+----------+----------+-------------
2007-10-30 | | 17 | 1 | 0 | 2
2007-10-31 | | 12 | 1 | 0 | 2
| golf | 9 | 0 | 1 | 1
| buster | 20 | 0 | 1 | 1
(4 rows)

postgres=# select selling_date, baguette, sum(items),
grouping(selling_date), grouping(baguette), grouping_id(selling_date,
baguette) from baguette_selling group by grouping sets(1,2,());
selling_date | baguette | sum | grouping | grouping | grouping_id
--------------+----------+-----+----------+----------+-------------
2007-10-30 | | 17 | 1 | 0 | 2
2007-10-31 | | 12 | 1 | 0 | 2
| golf | 9 | 0 | 1 | 1
| buster | 20 | 0 | 1 | 1
| | 29 | 0 | 0 | 0
(5 rows)

I thing so parser part is well and correct (and ported to 8.4).

CTE works well, but not 100% effective, and will be better to use
direct tuplestore interface (as second technique - when hash tables
can't to be used).

I am thinking, so the best solution is enhancing current Aggregate
node for support of GroupingSets. The code base on UNION ALL is +/-
equal to CTE, and I don't thing, so this should be optimal. But work
freely, please. I have not free time for this patch next two months.
So if you have time, it's your.

regards
Pavel Stehule

2009/5/10 Олег Царев <zabivator(at)gmail(dot)com>:
> Hello all.
> Please, approve my ideas for implementation.
>
> Standart has feature T431: Extended grouping capabilities.
> This feature i found in TODO-list:
> http://wiki.postgresql.org/wiki/Todo -> SQL Commands -> TO DO
>
> MS SQL 2005 partial support this feature:
> http://www.kodyaz.com/articles/sql-server-2005-cube-rollup-cannot-compute-distinct-aggregates.aspx
> http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx
>
> MS SQL 2008 support this feature:
> http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx
>
> Oracle support this feature:
> http://www.compshack.com/sql/oracle-group-rollup
>
> So, it's short notes about GROUPING SETS, but more complete
> information have in a official documentation of MS SQL and Oracle
> (copyright limited for send as attach).
>
> First. GROUPG SETS.
>
> select A,B,C,SUM(D) from table group by GROUPING SETS( (A,B,C), (A),
> () ) - it's example of use grouping sets.
> Semantic of this construction - make group by over source more, than
> one group of column.
> It's very wide key - A,B C. In result set of this example we can find
> result set of select select A,B,C,SUM(D) from table group by A,B,C -
> as subset. It's mind: "GROUP BY A,B,C" - subset of "GROUP BY GROUPING
> SETS( (A,B,C), (A), () )
> Two subset - is GROUP BY A B, and instead C column we look NULL.
> Third subset - GROUP BY (), instead A,B,C - NULL, one row - it's name
> "GRAND TOTAL". - calculate over all subset without grouping
>
> Also have function "GROUPING" it's function say about null - "real
> null" (from table) or generated by "GROUP BY GROUPING SETS"
>
> My point: this feature can implement over GROUP BY and UNION ALL
> We can make decomposition of "GROUP BY GROUPING SETS( (A,B,C),(A),()
> )" to select A,B,C fron table GROUP BY A,B,C .UNION ALL select
> A,B,NULL from table group BY A,B UNION ALL NUll,NUll,NULL from table
> group by();
>
> So, it's very simple, don't require modification of executor and
> callibrate cost - only parser and semantic anylysis,
> '
> So, ROLLUP(A1,...An) is alias to "GROUP BY GROUPING SETS(
> (A1,...,An),(A1,...,An-1),... (An-1,An),(An),() ),
> CUBE - analogue.
>
> If this idea it's good - i can write code base on old patch
> http://archives.postgresql.org/pgsql-hackers/2008-10/msg00838.php or
> from clean list (as you wish).
>
> In future i know how to implement ROLLUP more optimal (executor
> iterator) and use this ROLLUP for optimisation another GROUP BY,
> GROUPING SETS.
>
> Thanks.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Attachment	Content-Type	Size
gsets-0.3.diff	text/x-patch	45.3 KB

In response to

Implementation of GROUPING SETS (T431: Extended grouping capabilities) at 2009-05-10 12:57:06 from Олег Царев

Responses

Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) at 2009-05-12 22:13:38 from Олег Царев
Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) at 2009-05-13 03:09:09 from Joshua Tolley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2009-05-12 21:55:13	Fixing the libxml memory allocation situation
Previous Message	Simon Riggs	2009-05-12 21:15:26	Re: [COMMITTERS] pgsql: Fix LOCK TABLE to eliminate the race condition that could make it