Re: Window functions patch v04 for the September commit fest

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Window functions patch v04 for the September commit fest
Date: 2008-09-02 06:51:09
Message-ID: 48BCE25D.8040804@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gregory Stark wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
>> sfunc is called once for each input row. Unlike with normal aggregates, sfunc
>> is passed the whole input row, so that e.g RANK can compare it against the
>> previous row, or LEAD can buffer it.
>
> I'm not sure I follow this bit about being passed the whole input row. How
> would that relate to something like lag(x*y, 2) for example?

Well, sfunc needs to be passed not only the whole input tuple, but also
the values of the arguments.

Hmm. The spec says that the offset argument of lead and lag needs to be
a numeric literal. To enforce that at parse time, we'd have to hard-code
that into the grammar. Or we could check it at run-time, and throw an
error if its value changes across invocations of the sfunc function.

>> outfunc is a set-returning function, and it is called until it returns no more
>> rows, after each sfunc invocation.
>
> And in any case I feel like this is abstraction on the cheap. The only reason
> it's so general is because it's leaving all the work to the functions to
> implement.
>
> It also means we get no benefit from cases like
>
> SELECT lag(x,5),lag(y,5),lag(z,5)
>
> where the executor could keep one tuplestore and for all of them. For that
> matter it could keep one tuplestore even for cases like lag(x,5),lag(y,4).

True. I guess it comes down to how much complexity we're willing to have
in the executor node to handle that more efficiently.

Note that the memory usage is the same either way, except for the extra
constant overhead of multiple tuplestores, because each tuplestore
inside the lag implementation only needs to store its own column.

> What would the executor do for a query like
>
> SELECT lead(x,1),lead(y,2),lead(y,3)
>
> It would not only have to keep a tuplestore to buffer the output but it would
> have to deal with receiving data from different SRFs at different times. The
> best approach I can think of would be to keep a tuplestore for each SRF using
> themas queues, reading old values from the head as soon as they all have at
> least one new value in them.

Hitoshi solved that creating a separate Window node for each window
function. So the plan tree for that would look something like:

Window (lead(x,1))
Window (lead(y,2))
Window (lead(y,3))
Seq Scan ...

That keeps the Window node implementation quite simple because it only
needs to handle one window function at a time.

> And it doesn't answer how to deal with things like
>
> SELECT lag(x,1) OVER (ORDER BY a), lag(x,1) OVER (ORDER BY b)
>
> I, uh, don't actually have any ideas of how to deal with that one :(

The above handles that by putting extra sort nodes in between the Window
nodes. Not too efficient, but I don't see any way around it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2008-09-02 06:55:05 Re: Window functions patch v04 for the September commit fest
Previous Message Pavel Stehule 2008-09-02 06:46:19 Re: Is this really really as designed or defined in some standard