Re: Underspecified window queries in regression tests

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: Underspecified window queries in regression tests
Date: 2011-10-16 18:36:36
Message-ID: BEAC1AFC-8A2B-48FA-9682-993AF0A68427@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct16, 2011, at 20:04 , Greg Stark wrote:
> On Fri, Oct 14, 2011 at 9:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We could hack around this by adding more columns to the result so that
>> an index-only scan doesn't work. But I wonder whether it wouldn't be
>> smarter to add ORDER BY clauses to the window function calls. I've been
>> known to argue against adding just-in-case ORDER BYs to the regression
>> tests in the past; but these cases bother me more because a plan change
>> will not just rearrange the result rows but change their contents,
>> making it really difficult to verify that nothing's seriously wrong.
>
> I'm not sure if it applies to this case but I recall I was recently
> running queries on Oracle that included window functions and it
> wouldn't even let me run them without ORDER BY clauses in the window
> definition. I don't know if it cleverly determines that the ORDER BY
> will change the results or if Oracle just requires ORDER BY on all
> window definitions or what.

I was about to point out that whether or not ORDER BY is required probably
ought to depend on whether the window function acts on the frame, or the
whole partition. And that oracle quite likely knows that for the individual
window functions while we don't.

But, actually, our documentation states in 3.5, Window Functions, that

By default, if ORDER BY is supplied then the frame consists of all rows
from the start of the partition up through the current row, plus any
following rows that are equal to the current row according to the ORDER BY
clause. When ORDER BY is omitted the default frame consists of all rows
in the partition.

Either I'm confused, or that doesn't match the observed regression test
failure.

I did a few experiments. Assume that table "d" contains three rows with
v=1, v=2 and v=3.

This case seems to work (the frame is always the whole partition, even
though the frame_clause says "between unbounded preceding and current row"):

postgres=# select array_agg(v) over (range between unbounded preceding
and current row) from d;
array_agg
-----------
{1,2,3}
{1,2,3}
{1,2,3}

Once one adds an ORDER BY, the frame_clause is taken into account, so
that works too:

postgres=# select array_agg(v) over (order by v range between unbounded preceding
and current row) from d;
array_agg
-----------
{1}
{1,2}
{1,2,3}

But some frame clauses (row 1 preceding, for example) have an effect despite
there being no ORDER BY, like here:

postgres=# select array_agg(v) over (rows 1 preceding) from d;
array_agg
-----------
{1}
{1,2}
{2,3}

ISTM that we probably should ignore frame clauses, unless there's an ORDER BY.
Or maybe even throw an error?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-10-16 18:49:33 Re: proposal: set GUC variables for single query
Previous Message Tom Lane 2011-10-16 18:26:38 Re: Pushing ScalarArrayOpExpr support into the btree index AM