Re: analyzing query results

From: "Lonni J Friedman" <netllama(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: analyzing query results
Date: 2007-08-13 20:02:15
Message-ID: 7c1574a90708131302l74fe01e7h804623030a433dfb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On 8/10/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Lonni J Friedman" <netllama(at)gmail(dot)com> writes:
> > select subtest,os,arch,info FROM ${table} WHERE (SELECT now() -
> > interval '24 hours' < date_created::timestamp)='t' AND
> > current_status='FAILED' ;
>
> > My problem is that I can't think of a non-trivial way to adjust the
> > SQL query so that it will only capture the last time the subtest ran,
> > in the event that it ran multiple times during a 24 hour window.
>
> If you don't mind using a nonstandard feature, DISTINCT ON will probably
> help you solve this. This example looks pretty close to being the same
> as the "get the latest weather reports" example that you'll find in the
> PG reference page for SELECT.

Here's some example data that illustrates the problem i'm having:

id | date_created |
cudacode | current_status
----+----------------------------+-----------------------------------------------------------------------+----------------------------------------
13 | 2007-08-13 11:00:18.782425 |
2007-05-13.cuda-linux32-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler
| FAILED
14 | 2007-08-13 11:00:39.917609 |
2007-05-13.cuda-linux32-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler
| PASSED
15 | 2007-08-13 11:46:09.770971 |
2007-05-13.cuda-linux64-test0.CUBLAS.debug.hostEmu.CheckedIn_Compiler
| FAILED

In the above data, id 13 & 14 are two iterations of the same subtest
(they have the same cudacode), yet the first has a current_status of
FAILED, and the second is PASSED. What I want is a query that will
detect this and return nothing because the later row (per the
date_created field) has PASSED, since all I care about is when the
last iteration has FAILED. id 15 has a different cudacode, and
therefore should be returned from the query as well. In other words,
I'd like to see the content of row 15 returned, since its the only one
that meets my criteria.

Or as a starting point, just get rows 14 & 15 returned, since 14 & 15
are the newest unique instances of cudacode, and then I can check the
current_status in those as a secondary query.

I tried to call DISTINCT ON, with the following query:

SELECT DISTINCT ON(date_created, cudacode) id,current_status FROM
cudaapps WHERE (SELECT now() - interval '24 hours' <
date_created::timestamp)='t' ORDER BY date_created;

but that is returning id 13, 14 & 15. If I strip out the date_created
in the DISTINCT ON and ORDER BY sections:

SELECT DISTINCT ON(cudacode) id,current_status FROM cudaapps WHERE
(SELECT now() - interval '24 hours' < date_created::timestamp)='t' ;

then I get id 13 & 15 all over again, rather than 14 & 15.

I'm guessing that I'm missing something really obvious here, but I
can't quite figure out what.

>
> BTW, why are you writing the timestamp filter condition in such a
> bizarre way? I'd expect to see that query written as
>
> select subtest,os,arch,info FROM ${table}
> WHERE
> (now() - interval '24 hours') < date_created
> AND current_status='FAILED' ;
>
> which would have a fighting chance of using an index on date_created.
> The useless sub-SELECT not only eliminates any chance of using an index,
> but incurs a fair amount of extra runtime overhead.

I'm doing this because date_created is of type char(20) rather than a
normal/expected date datatype. I know, its dumb.

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Howard Eglowstein 2007-08-13 20:41:59 More of a SQL question, I guess.
Previous Message Lonni J Friedman 2007-08-13 19:49:49 Re: analyzing query results