Re: Underspecified window queries in regression tests

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: Underspecified window queries in regression tests
Date: 2011-10-17 09:48:38
Message-ID: D91C5B29-8B22-4EC2-803D-B7C4677E8EA5@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct17, 2011, at 01:09 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> ... reading those parts again, I realize the it says "When ORDER BY is omitted
>> the *default* frame consists ... ", and that the second quote is followed
>> by a footnote which says
>
>> There are options to define the window frame in other ways, but this tutorial
>> does not cover them. See Section 4.2.8 for details. [3.5, Window Functions]
>
>> So it was just me being thick. Sorry for the noise.
>
> Hmm. Maybe the use of a <footnote> there is too subtle, and we should
> instead have that text in-line (probably in parentheses)? Or we could
> use a <note>, but that's probably too much emphasis.

Inline and in parentheses sounds fine.

In addition, I think we should reword the explanation in 4.2.8 (The SQL Language
/ SQL Syntax / Value Expressions / Window Functions). Instead of that rather
long (and IMHO hard to read) paragraph about possible frame clauses and their
behaviour in the presence or absence of an ORDER BY clause, we should go with
a more algorithmic explanation I think.

Something along these lines maybe:

----------
.) PARTITION BY splits the rows into disjoint partitions. All further processing
happens only inside a single partition

.) In RANGE mode, ORDER BY then splits each partition into an ordered list of
sub-partitions, each containing rows which the ORDER BY considers to be
equivalent.

.) In ROWS mode, OTOH, each sub-partition contains only a single row. Thus, if
there are rows which are considered to be equivalent by the ORDER BY, the
ordering of the sub-partition isn't fully determined.

.) Each row's frame then consists of some consecutive range of sub-partitions.

.) In RANGE mode, that consecutive range can only start at either the first
sub-partition or the current row's sub-partition, and can only end at either
the current row's sub-partition or the last sub-partitions.

.) In ROWS mode, the consecutive range may additional start <n> sub-partitions
(or rows, it's the same thing here) before the current row, and may additionally
end <m> sub-partitions/rows after the current row.

From that, it follows that even with an underspecified sort order, the contents of
each frame are still fully determined in RANGE mode. The ordering of rows within
a frame is not determined, though. So overall, in RANGE mode, a query's result is
only non-deterministic if the window function is sensitive to the ordering of rows
within a frame.

In ROWS mode, OTOH, the contents each frame themselves are not fully determined,
so even an ordering agnostic window function may produce non-deterministic results.
----------

If you think that something along these lines would be an improvement, I can try
to come up with a patch.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-10-17 10:53:53 termination of backend waiting for sync rep generates a junk log message
Previous Message Jan Urbański 2011-10-17 08:51:15 Re: proposal: set GUC variables for single query