Re: window function ordering not working as expected

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Lonni J Friedman <netllama(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: window function ordering not working as expected
Date: 2015-02-18 00:18:01
Message-ID: 17807.1424218681@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Lonni J Friedman <netllama(at)gmail(dot)com> writes:
> I'm interested in seeing:
> * the date for the most recent result
> * test name (identifier)
> * most recent result (decimal value)
> * the worst (lowest decimal value) test result from the past 21 days
> * the date which corresponds with the worst test result from the past 21 days
> * the 2nd worst (2nd lowest decimal value) test result
> ...
> The problem that I'm seeing is in the prv_score column. It should show
> a value of 0.6, which corresponds with 2015-02-13, however instead its
> returning 0.7. I thought by ordering by metrics->>'PT TWBR' I'd always
> be sorting by the scores, and as a result, the lead(metrics->>'PT
> TWBR', 1) would give me the next greatest value of the score. Thus my
> confusion as to why ORDER BY metrics->>'PT TWBR' isn't working as
> expected.

lead() and lag() retrieve values from rows that are N away from the
current row in the specified ordering. That isn't what you want here
AFAICS.

I think the worst test result would be obtained with
nth_value(metrics->>'PT TWBR', 1)
which is equivalent to what you used,
first_value(metrics->>'PT TWBR')
while the 2nd worst result would be obtained with
nth_value(metrics->>'PT TWBR', 2)

However, "worst" and "2nd worst" with this implementation would mean
"worst and 2nd worst within the partition", which isn't the stated
goal either, at least not with the partition definition you're using.

What you really want for the "worst in last 21 days" is something like

min(metrics->>'PT TWBR') OVER (
PARTITION BY ... that same mess you used ...
ORDER BY tstamp
RANGE BETWEEN '21 days'::interval PRECEDING AND CURRENT ROW)

However Postgres doesn't implement RANGE x PRECEDING yet. You could
get "worst in last 21 observations" easily:

min(metrics->>'PT TWBR') OVER (
PARTITION BY ... that mess ...
ORDER BY tstamp
ROWS BETWEEN 20 PRECEDING AND CURRENT ROW)

and maybe that's close enough.

I do not know an easy way to get "second worst" :-(. You could build a
user-defined aggregate to produce "second smallest value among the inputs"
and then apply it in the same way as I used min() here.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2015-02-18 00:31:18 Re: Issue dumping schema using readonly user
Previous Message Stephen Frost 2015-02-17 23:58:30 Re: Issue dumping schema using readonly user