Re: Windowing Function Patch Review -> Standard Conformance

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Date: 2008-11-30 19:11:10
Message-ID: 9BB7F4690A644063B48FBD57963652EA@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Heikki Linnakangas [mailto:heikki(dot)linnakangas(at)enterprisedb(dot)com]
> Sent: 26 November 2008 09:09
> To: Hitoshi Harada
> Cc: David Rowley; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: Windowing Function Patch Review -> Standard Conformance
>
> Hitoshi Harada wrote:
> > 2008/11/26 David Rowley <dgrowley(at)gmail(dot)com>:
> >> I'm at a bit of a loss to what to do now. Should I wait for your work
> >> Heikki? Or continue validating this patch?
> >>
> >> The best thing I can think to do right now is continue and any problems
> I
> >> find you can add regression tests for, then if we keep your regression
> tests
> >> for Heikki's changes then we can validate those changes more quickly.
> >>
> >> Any thoughts? Better ideas?
> >
> > Thanks to your great tests, we now know much more about specification
> > and where to fail easily, so continuing makes sense but it may be good
> > time to take a rest and wait for Heikki's patch completing.
>
> Here's another updated patch, including all your bug fixes.
>
> There's two known issues:
> - ranking functions still don't treat peer rows correctly.
>
> - I commented out the "this function requires ORDER BY clause in the
> window" test in rank_up, because a window function shouldn't be poking
> into the WindowState struct like that. I wonder if it's really needed?
> In section 7.11, the SQL2008 spec says "if WD has no window ordering
> clause, then the window ordering is implementation-dependent, and *all
> rows are peers*". The regression test now fails because of this, but the
> current behavior actually seems correct to me.
>

Thanks for the updated patch. I've been doing a little testing over the
weekend.

I'm running into a small problem with passing results from normal aggregate
functions into the window function. Hitoshi and I saw this before:

SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP
BY depname;
ERROR: variable not found in subplan target list

Hitoshi fixed this in his 2008-11-12 patch, but it seems to have found its
way back in.

I was also reading over the standard tonight. I've discovered that the
OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is not
present. Oracle seems to support this.

SQL2008 says:
> If <lead or lag function> is specified, then:
> i) Let VE1 be <lead or lag extent> and let DT be the declared type of VE1.
> ii) Case:
> Scalar expressions 209
> WD 9075-2:200w(E)
> 6.10 <window function>
> If <offset> is specified, then let OFF be <offset>. The declared type of
>OFF shall be an
> exact numeric type with scale 0 (zero).
> 1)
> 2) Otherwise, let OFF be 1 (one).

Yet another variant of LEAD() and LAG() but I think well worth it for both
compliance to the standard and compatibility to Oracle.

I've also been thinking about the on-demand buffering for the window
functions that you talked about. In some of my previous performance tests I
noticed that LEAD with a LIMIT clause is not optimal as it will store all
the rows in the tuplestore before applying the LIMIT:

select timestamp,lead(timestamp,1) over (order by id) from bigtable order by
id limit 1;

Is there any way for the on-demand buffering to make the above query faster?
Really only the first 2 rows ordered by id need to be read.

I had previously thought this would be too difficult to implement as the
OFFSET can be variable, but maybe it's possible on-demand. Yet I'd imagine
it's difficult to know when to allow rows to exit the tuplestore as a
LAG(col, someVariableValue) might need those rows later.

My next task is to read more of the standard just in case there is anything
else we should know.

David.

> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-11-30 20:13:50 Re: WIP: default values for function parameters
Previous Message Tom Lane 2008-11-30 18:49:36 pgsql: Remove inappropriate memory context switch in