Re: Window functions patch v04 for the September commit fest

From: David Fetter <david(at)fetter(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Window functions patch v04 for the September commit fest
Date: 2008-09-01 20:39:03
Message-ID: 20080901203903.GP3717@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 01, 2008 at 09:00:47PM +0300, Heikki Linnakangas wrote:
> Hitoshi Harada wrote:
>> 2008/8/30 Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>:
>>> Here's the latest window functions patch against HEAD. It seems to be
>>> ready for the September commit fest, as added documents, WINDOW clause
>>> feature and misc tests. I guess this would be the window functions
>>> feature freeze for 8.4. The remaining feature will be implemented for
>>> the later release.
>>>
>>> This patch consists of:
>>> - windowed aggregates
>>> - cooperate with GROUP BY aggregates
>>> - some optimizations with multiple windows
>>> - ranking functions
>>> - WINDOW clause
>>> - windowing SQL regression tests
>>> - sgml documents
>>>
>>> Looking up the total road map, the dropped features are:
>>>
>>> - sliding window (window framing)
>>> - lead(), lag(), etc. that reach for random rows
>>> - user defined window functions
>
> Ok, I'm starting to read up on SQL2003 window functions,

Maybe it would be better to read the SQL2008 standard
<http://wiscorp.com/sql200n.zip> :)

> and to review this patch. Here's a long brain-dump of my first
> thoughts on the design.
>
> Let's list a couple of requirements first:
>
> 1. It's important that what gets committed now can be extended to
> handle all of the window function stuff in SQL2003 in the future,
> as well as user-defined-window-functions in the spirit of
> PostgreSQL extensibility. Even if we don't implement all of it in
> this release.

> 2. Performance needs to be reasonable, in the face a large number of
> input rows. These are typically used in OLAP queries that process
> gazillions of rows. Some window functions need access to all rows in the
> window frame or partition, so you can't reasonably expect great
> performance with those if the working set is large, but those that
> don't, should work with finite memory requirements, and reasonably fast.
>
> Because of 1., let's focus on the interface for
> user-defined-window-functions first. Ideally, the interface is such
> that all the window functions and aggregates defined in SQL2003,
> and others we might want to have in core or as pgfoundry projects,
> can be implemented using it.

SQL2008 defines the following:

<window function> ::=
<window function type> OVER <window name or specification>

<window function type> ::=
<rank function type> <left paren> <right paren>
| ROW_NUMBER <left paren> <right paren>
| <aggregate function>
| <ntile function>
| <lead or lag function>
| <first or last value function>
| <nth value function>

<rank function type> ::=
RANK
| DENSE_RANK
| PERCENT_RANK
| CUME_DIST

<ntile function> ::=
NTILE <left paren> <number of tiles> <right paren>

<number of tiles> ::=
<simple value specification>
| <dynamic parameter specification>

<lead or lag function> ::=
<lead or lag> <left paren> <lead or lag extent>
[ <comma> <offset> [ <comma> <default expression> ] ] <right paren>
[ <null treatment> ]

<lead or lag> ::=
LEAD | LAG

<lead or lag extent> ::=
<value expression>

<offset> ::=
<exact numeric literal>

<default expression> ::=
<value expression>

<null treatment> ::=
RESPECT NULLS | IGNORE NULLS

<first or last value function> ::=
<first or last value> <left paren> <value expression> <right paren> [ <null treatment> ]

<first or last value> ::=
FIRST_VALUE | LAST_VALUE

<nth value function> ::=
NTH_VALUE <left paren> <value expression> <comma> <nth row> <right paren>
[ <from first or last> ] [ <null treatment> ]

<nth row> ::=
<simple value specification>
| <dynamic parameter specification>

<from first or last> ::=
FROM FIRST
| FROM LAST

<window name or specification> ::=
<window name>
| <in-line window specification>

<in-line window specification> ::=
<window specification>

> For functions like LEAD or CUME_DIST that don't immediately start
> returning values, the executor will need a tuplestore to buffer input
> rows until it gets their values from the window function.

For these aggregates, should there be an API that signals that a tuplestore
will be needed?

> The syntax for creating a ranking aggregate might look something like this:
>
> CREATE RANKING AGGREGATE name (
> SFUNC = sfunc,
> STYPE = state_data_type,
> OUTFUNC = outfunc
> )
>
> This is very similar to Simon's proposal, and to current CREATE
> AGGREGATE, but tweaked a bit to fix the problems Hitoshi pointed
> out.
>
> sfunc is called once for each input row. Unlike with normal
> aggregates, sfunc is passed the whole input row, so that e.g RANK
> can compare it against the previous row, or LEAD can buffer it.
>
> outfunc is a set-returning function, and it is called until it
> returns no more rows, after each sfunc invocation.

> Window aggregates
> -----------------
>
> Like normal aggregates, window aggregates process N input values, and
> return one output value. In normal GROUP BY expressions, the aggregate
> function is called once each group, but in a window aggregate expression
> with a window frame (aka sliding window), there is as many "groups" as
> there is rows.
>
> In fact, any normal aggregate function can also be used as a window
> aggregate and vice versa. The trivial implementation is to have a buffer
> to hold all values currently in the window frame, and for each input
> row, invoke the aggregate function over all the rows currently in the
> frame. I think we'll need support that, to allow using any built-in or
> user-defined aggregate as a window aggregate.
>
> However, we also want to provide optimized versions of the common
> aggregates. Aggregates like COUNT, SUM, and AVG can easily be
> implemented more efficiently, without rescanning all the tuples in the
> group.
>
> I propose an extension to the current CREATE AGGREGATE syntax to allow
> for optimized versions. Currently, the syntax is like:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> )
>
> Let's add a function to allow removing tuples from the current window frame:
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> [ , SUBTRACTFUNC = subfunc,
> )
>
> subfunc is like sfunc, except that the input value is "subtracted" from
> the current value. On each input row, the executor calls sfunc for all
> new tuples that enter the window frame, and subfunc for all tuples that
> exit it. Another difference is that finalfunc can be called many times
> during the lifecycle of an aggregate invocation. For example, the
> subfunc for COUNT would decrement the row count by one, and for SUM,
> subfunc would subtract the removed value from the current sum. Perhaps
> we should also support an "opportunistic subfunc", that could either
> perform the subtraction, or return a special value indicating that it
> can't be done, and we need to calculate the aggregate from scratch as if
> subfunc wasn't defined. That would be useful for MIN/MAX, which can be
> implemented by just keeping track of the current MIN/MAX value, and
> "subtracting" any other value than the current MIN/MAX is a no-op, but
> "subtracting" the current MIN/MAX value requires scanning all the rest
> of the tuples from scratch.
>
> PS. Have you looked at the "hypothetical set functions" in SQL2003?

They're kinda neat :)

> PPS. I was just about to send this, when I noticed that LEAD/LAG are in
> fact not in the SQL2003 draft I have at hand. In fact, none of the
> window functions defined there, RANK, DENSE_RANK, PERCENT_RANK,
> CUME_DIST and ROW_NUMBER, require random access to the tuples. Still
> seems like a good idea to be have the flexibility for them.

They're in SQL:2008.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-01 21:02:25 Re: Is this really really as designed or defined in some standard
Previous Message Tom Lane 2008-09-01 20:00:58 Re: [Patch Review] TRUNCATE Permission