Re: Window functions patch v04 for the September commit fest

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Window functions patch v04 for the September commit fest
Date: 2008-09-03 07:25:16
Message-ID: 48BE3BDC.3060601@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hitoshi Harada wrote:
> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>> Hitoshi Harada wrote:
>>> 2008/9/2 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>>> In my understanding, the "Window Frame" is defined
>>> by clauses such like "ROWS BETWEEN ... ", "RANGE BETWEEN ... " or so,
>>> contrast to "Window Partition" defined by "PARTITION BY" clause. A
>>> frame slides within a partition or there's only one frame if those
>>> clauses are not specified. The current patch has not implemented this
>>> yet. I'll update the docs.
>> Yes, that's how I read it as well. Another way to think of it is that
>> there's always a window frame, but if no ROWS BETWEEN or similar clause is
>> given, the window frame spans the whole partition (or from the 1st row of
>> the partition, up to the current row, if ORDER BY is given).
>>
>>> I don't like to call the second type "ranking aggregates" because it
>>> may refer to only ranking functions though there are more types of
>>> function like ntile(), lead() and lag(). But "window functions"
>>> doesn't seem appropriate either since it's ambiguous with the general
>>> name of window expressions.
>> Yep, confusing :-(. The SQL standard says that "a window function is one of:
>> a rank function, a distribution function, a row number function, a window
>> aggregate function, the ntile function, the lead function, the lag function,
>> the first-value function, the last-value function, the nth-value function".
>> So, window aggregate functions are a special class of window functions, and
>> there's no term to refer to all the rest of the window functions excluding
>> window aggregate functions.
>>
>> Your doc SQL spec
>> Window expression Window function
>> Window function Any window function other than a window aggregate
>> function
>> Window aggregate Window aggregate function
>>
>> I tried to coin the term "ranking aggregate" for the SQL2008 term "Any
>> window function other than a window aggregate function", but you're right
>> that that's still confusing, because the SQL2008 term "rank function"
>> includes only RANK() and DENSE_RANK().
>>
>> The spec calls them "group aggregate functions", when they're used with
>> GROUP BY, rather than as a window function. I think we could use that term.
>
> Agree. So from now on, we use "window functions" for all kinds of
> functions including window aggregates. "Window expression" is
> discarded. "Window functions" also means the mechanism to support
> these functions to process and this project.
>
>>> Your proposal is smarter than the current implementation. But it
>>> doesn't seem complete in some way. From logical point of view, the
>>> window functions should be allowed to access whole of rows in the
>>> frame the current row belongs to (c.f. inverse distribution
>>> functions).
>> By the whole of rows, do you mean
>> a) the chosen value or expression of all rows, or
>> b) all columns of the input rows?
>
> a). I mean all input rows in a window frame. But later I found
> "inverse distribution function" is not one of window functions. That
> is actually one of aggregate functions. Forget about it.
>
>> Different window functions have different needs. RANK() for example does
>> need to see all columns, to compare them, but it only needs access to the
>> previous and current row. CUME_DIST on the other hand needs access to all
>> columns of all rows, and LAG needs access to a specific column of a fixed
>> number of rows. And row_number needs nothing.
>>
>> The needs of access to the rows are so different that it seems best to me to
>> delegate the buffering to the window function.
>
> Delegating optimization to them depending on functions' needs is a
> good idea. So executor can concentrate on the window function process
> flow. Let's unify it in executor and let trivial optimizations get
> into individual functions.
>
>> Actually, looking closer to the ranking functions, they don't really need
>> access to all columns. They just need to be able to compare them, according
>> to the window ordering, so we should probably only provide access to the
>> arguments of the aggregate, evaluated for any row in the frame, and a
>> comparator function, that can determine if any given rows in the frame are
>> <, = or >.
>
> That is kind of problem. If your task is only to define that window
> node executor simply stores window frame rows and pass them to window
> functions as they need, the rank functions' needs don't come. As you
> point out, rank functions need ordering key columns and its
> comparators. So you have to imagine what comes next? What will be
> wanted other than ordering key columns, if we think about "universe
> window functions" much more than SQL spec says?

It might be a good idea to google around what window functions other
DBMSs support, and see if this scheme could support all of them. I
couldn't find any that it couldn't, but I didn't look very hard.

>> Let's look at the trivial, generic, and slow implementation first, and then
>> figure out what tricks we can do to make it faster. I gather that that
>> generic algorithm, following how the SQL spec defines the window frame,
>> looks like this:
>
> So as to satisfy all of window functions' needs, Window object does
> well. But it is so heavy that we need to optimize as functions need,
> by reducing stored rows and avoiding to call redundant functions.
> Still I'm afraid if user can define such a complex function...

An average user probably can't. Creating a regular aggregate using the
current method is not exactly trivial either.

At the moment, I'm more worried about finding a good abstraction to use
within the backend, and worry about exposing it to user-defined
functions later.

>> Here's sketches of some aggregate implementations using this API:
>>
>> RANK()
>> ------
>> enterfunc:
>> if position of the new row is > current row, do nothing. Otherwise,
>> decrement current rank.
>> exitfunc:
>> if position of the new row is > current row, do nothing. Otherwise,
>> increment current rank.
>
> I don't see why the row's positions affect rank. The rank depends on
> comparing two rows ordering columns, doen't it?

Yes. Imagine a window frame like this (e.g. from "ROWS BETWEEN 2
PRECEDING AND 2 FOLLOWING"):

10
20
30 <-- current row
40
50

RANK() doesn't care about rows that enter frame, that are after current
row. For example, if 60 enters the frame:

10
20
30 <-- current row
40
50
60

The RANK() of the current row is still 3, as it was before. RANK() only
advances when the current row is advanced, or new rows that precede the
current row enter the frame, e.g if a peer of the current row enters the
frame, which is possible with a window-frame-exclusion clause:

10
20
30
30 <-- current row
40
50
60

>
>> I'd suggest:
>>
>> 1. Implement Window node, with the capability to invoke an aggregate
>> function, using the above API. Implement required parser/planner changes.
>> Implement a few simple ranking aggregates using the API.
>> 2. Implement glue to call normal aggregates through the new API
>> 3. Implement the optimization to drop rows that are no longer needed
>> (signal_cutoff can be a no-op until this phase)
>> 4. Implement window framing (the frame can always be all rows in the
>> partition, or all rows until the current row, until this phase)
>> 5. Expose the new API to user-defined aggregates. It can be an internal API
>> only used by built-in functions until this phase
>>
>> I believe you already have phase 1 in your patch, except for the API
>> changes.
>
> I am willing to challenge to implement the API above, after maintain
> the current patch adding docs and tests. Since the API includes
> changes much more like Aggregate syntax than current patch, I'm not
> sure if I can finish it by next commit fest, which is said to be
> "feature freeze". For safety, remain the current patch to review
> excluding API and executor then if I fail to finish use it for next
> release. Git helps it by cutting a branch, does it? How do you think?

We do allow changes to the user manual after the feature freeze, so I'd
suggest concentrating on the code and tests first. Code comments and
internal docs are important, though, for easy review.

I'm sure we won't get all the way to phase 5 for 8.4, but if we can even
get 1-3, plus some of the most important window functions, this this
will be a great release!

I'll review the parser/planner changes from the current patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-09-03 08:17:16 Re: Window functions patch v04 for the September commit fest
Previous Message Brendan Jurd 2008-09-03 07:15:36 pg_typeof() (was: Mysterious Bus Error with get_fn_expr_argtype())