OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From: "Robert Bedell" <robert(at)friendlygenius(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Date: 2003-12-17 21:56:06
Message-ID: 200312171656870.SM00984@xavier
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL. I don't actually own the proper SQL99
specifications, and since the newer sql2003 ones are coming out I don't know
which to purchase. Could someone point me in the right direction? I've
looked in the mailing lists and the docs and found some interest in olap
like functionality, but not like what I found in other databases (such as
Oracle and DB2).

More specifically I would like to add grouping sets, and the CUBE and ROLLUP
operators, into postgresql. Since modifying such the GROUP BY operation
would necessitate changing the query structure, wouldn't that affect the
query rewrites and genetic optimizer? On a superficial level yes, but would
any existing query rewriting care about additional grouping being done in a
GROUP BY operation? More specifically, what would changing the query
structure affect by making the GROUP BY clause a list of lists (the list of
the grouping sets) instead of an expression list as it currently is?

An example of a ROLLUP result might be (pseudoresults..):

CREATE TABLE SALES_SUMMARY (NAME TEXT, DEPARTMENT TEXT, SALES INTEGER);
-- populate with data

SELECT DEPARTMENT, NAME, SUM(SALES) FROM SALES_SUMMARY GROUP BY
ROLLUP(DEPARTMENT,NAME);

DEPARTMENT NAME SUM(SALES)
-------------- -------- ----------
Dept one Bob 13
Dept one Jeff 12
Dept one NULL 25
Dept two Jim 10
Dept two Mary 11
Dept two NULL 21
NULL NULL 46

-- Where the rows with NULLs represent the subtotals and grandtotals,
respectively.

Any thoughts?

Along the same vein, window partitioning would be nice for aggregates.
Aggregate expressions like: RANK() OVER (PARTITION BY DEPARTMENT ORDER BY
SALARY DESC). They get rid of a lot of subselect operations quite nicely.

These are not simple projects, I know. There are a lot of features in high
databases I would like to have in open source tools, and I would like to
make a contribution towards getting them there ;)

PS - ...no, I won't even mention materialized views...

-----------------
Robert Bedell

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2003-12-17 22:13:02 Re: TODO list
Previous Message Tom Lane 2003-12-17 21:47:21 Re: time interval behaviour seems odd