Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Bedell <robert(at)friendlygenius(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Date: 2003-12-17 23:24:40
Message-ID: 1071703480.3831.24.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane kirjutas N, 18.12.2003 kell 00:27:
> "Robert Bedell" <robert(at)friendlygenius(dot)com> writes:
> > I'm curious if anyone has ever looked into adding OLAP functionality (per
> > the SQL99 specs) into PostGreSQL.

As a first project one could think of implementing NULLS FIRST/LAST
(from 4.14.9) for all ORDER BY operations.

> There was a fairly crude CUBE implementation submitted (and rejected) a
> few months ago, but there's not been any work I thought had a chance of
> getting committed. This is an issue of implementation quality rather
> than whether we want the feature --- I think the community is interested
> in adding any and all features that are in SQL99.
>
> > More specifically I would like to add grouping sets, and the CUBE and ROLLUP
> > operators, into postgresql. Since modifying such the GROUP BY operation
> > would necessitate changing the query structure, wouldn't that affect the
> > query rewrites and genetic optimizer?
>
> I don't think either the rewriter or GEQO would notice at all. The
> regular optimizer definitely would though.

If it would mess up the optimiser, then could the extra aggregators not
be put in after optimisations, at least for hash aggregators ?

The implementation using hash aggregators should be just a SMOP
http://people.kldp.org/~eunjea/jargon/?idx=SMOP.html ;)

While ROLLUP could easily be implemented on the same scan over sorted
data as an ordinary GROUP BY by adding extra aggregators for partial
matches, CUBE and arbitrary grouping sets can't.

ie the query

SELECT SUM(a)
GROUP BY ROLLUP (b,c)

if run over a sorted set, would have a distinct aggregator for each of
(b,c), (b,) and (,) which can be reinitialised/reused every time b,c or
b changes. CUBE(b,c) would need hash aggregators for at least (,c).

> > These are not simple projects, I know.
>
> Might be a tad ambitious for your first venture into backend hacking...

OTOH it may be quite easy to *test* implementation ideas by always
expand GROUP BY into ROLLUP or CUBE by adding the aggregators just
before the executor, without touching the parser and pre-optimisation
query tree at all.

--------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2003-12-17 23:32:40 Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Previous Message Þórhallur Hálfdánarson 2003-12-17 23:03:48 Re: TODO list