WIP Patch for GROUPING SETS phase 1

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-13 18:37:03
Message-ID: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is phase 1 (of either 2 or 3) of implementation of the standard
GROUPING SETS feature, done by Andrew Gierth and myself.

Unlike previous attempts at this feature, we make no attempt to do
any serious work in the parser; we perform some minor syntactic
simplifications described in the spec, such as removing excess parens,
but the original query structure is preserved in views and so on.

So far, we have done most of the actual work in the executor, but
further phases will concentrate on the planner. We have not yet
tackled the hard problem of generating plans that require multiple
passes over the same input data; see below regarding design issues.

What works so far:

- all the standard syntax is accepted (but many combinations are not
plannable yet)

- while the spec only allows column references in GROUP BY, we
continue to allow arbitrary expressions

- grouping sets which can be computed in a single pass over sorted
data (i.e. anything that can be reduced to simple columns plus one
ROLLUP clause, regardless of how it was specified in the query), are
implemented as part of the existing GroupAggregate executor node

- all kinds of aggregate functions, including ordered set functions
and user-defined aggregates, are supported in conjunction with
grouping sets (no API changes, other than one caveat about fn_extra)

- the GROUPING() operation defined in the spec is implemented,
including support for multiple args, and supports arbitrary
expressions as an extension to the spec

Changes/incompatibilities:

- the big compatibility issue: CUBE and ROLLUP are now partially
reserved (col_name_keyword), which breaks contrib/cube. A separate
patch for contrib/ is attached that renames the cube type to "cube"; a
new name really needs to be chosen.

- GROUPING is now a fully reserved word, and SETS is an unreserved keyword

- GROUP BY (a,b) now means GROUP BY a,b (as required by spec).
GROUP BY ROW(a,b) still has the old meaning.

- GROUP BY () is now supported too.

- fn_extra for aggregate calls is per-call-site and NOT
per-transition-value - the same fn_extra will be used for interleaved
calls to the transition function with different transition values.
fn_extra, if used at all, should be used only for per-call-site info
such as data types, as clarified in the 9.4beta changes to the ordered
set function implementation.

Future work:

We envisage that handling of arbitrary grouping sets will be best
done by having the planner generating an Append of multiple
aggregation paths, presumably with some way of moving the original
input path to a CTE. We have not really explored yet how hard this
will be; suggestions are welcome.

In the executor, it is obviously possible to extend HashAggregate to
handle arbitrary collections of grouping sets, but even if the memory
usage issue were solved, this would leave the question of what to do
with non-hashable data types, so it seems that the planner work
probably can't be avoided.

A new name needs to be found for the "cube" data type.

At this point we are more interested in design review rather than
necessarily committing this patch in its current state. However,
committing it may make future work easier; we leave that question
open.

Regards,

Atri

Attachment Content-Type Size
groupingsets_ver1.patch text/x-diff 168.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Atri Sharma 2014-08-13 18:43:31 Re: WIP Patch for GROUPING SETS phase 1
Previous Message Euler Taveira 2014-08-13 18:13:43 Re: how to implement selectivity injection in postgresql