Quick Links

Re: analyzing query results

From:	"Lonni J Friedman" <netllama(at)gmail(dot)com>
To:	"Jon Sime" <jsime(at)mediamatters(dot)org>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: analyzing query results
Date:	2007-08-13 19:49:49
Message-ID:	7c1574a90708131249p7605b07fv2d01bf0e3fec3516@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

On 8/10/07, Jon Sime <jsime(at)mediamatters(dot)org> wrote:
> Lonni J Friedman wrote:
> > The problem here is that occasionally the same subtest runs more than
> > once in a 24 hour period (although not often), and not all of the test
> > runs will have the same result (current_status). To make matters more
> > complicated when this happens, the additional runs will have a
> > current_status of PASSED, yet my automated report still grabs the
> > older FAILED test result, and this confuses the hell out of the people
> > reviewing the report, since its technically passed already.
>
> DDL and some sample data would be useful to help out, but if I
> understood what you were describing, here's a simple mockup I tossed
> into my local playground:
>
> create table testresults (
> id serial not null primary key,
> subtest text not null,
> os text not null,
> arch text not null,
> info text not null,
> current_status text not null,
> date_created date not null);
>
> insert into testresults
> (subtest, os, arch, info, current_status, date_created)
> values
> ('a','linux','x86','info','PASSED',now()),
> ('b','linux','x86','info','PASSED',now()),
> ('c','linux','x86','info','FAILED',now()),
> ('c','linux','x86','info','FAILED','2007-07-01'),
> ('c','linux','x86','info','PASSED','2007-07-01'),
> ('d','linux','x86','info','FAILED',now()),
> ('d','linux','x86','info','PASSED',now()),
> ('e','linux','x86','info','FAILED','2007-07-01');
>
> select a.*
> from testresults a
> left join testresults b on (b.id <> a.id and
> b.date_created >= a.date_created and
> b.current_status = 'PASSED' and
> row(b.subtest, b.os, b.arch)
> = row(a.subtest, a.os, a.arch))
> where a.current_status = 'FAILED' and b.id is null and
> a.date_created >= now() - interval '24 hours';
>
> id | subtest | os | arch | info | current_status | date_created
> ----+---------+-------+------+------+----------------+--------------
> 8 | c | linux | x86 | info | FAILED | 2007-08-10
> (1 row)
>
> The row constructors within the join should be adjusted to match the set
> of columns within the table that would adequately match two separate
> rows as having been the same subtest on a different run (I just picked
> the three that stood out from your original query as possibly being
> identifying).
>
> This may well not be the best way to do it. I've obviously only tested
> it against a very tiny set of data. And without any DDL or sample data,
> I may have misunderstood the problem.

First, thanks much for your input. I'm sorry that I didn't provide
sample data up front.

I'll go into more detail now. In my database, I have one column whose
content is always the same, for each iteration of a subtest. That
column is named 'cudacode'. Thus, if a given subtest runs multiple
times in any 24 hour window, each inserted row will always have the
same cudacode for that subtest. So cudacode is the unique identifier.
That said, here's some example data that illustrates the problem i'm
having:

In the above data, id 13 & 14 are two iterations of the same subtest
(they have the same cudacode), yet the first has a current_status of
FAILED, and the second is PASSED. What I want is a query that will
detect this and return nothing because the later row (per the
date_created field) has PASSED, since all I care about is when the
last iteration has FAILED. id 15 has a different cudacode, and
therefore should be returned from the query as well. In other words,
I'd like to see the content of row 15 returned, since its the only one
that meets my criteria.

Or as a starting point, just get rows 14 & 15 returned, since 14 & 15
are the newest unique instances of cudacode, and then I can check the
current_status in those as a secondary query. I tried to give that a
whirl below:

select a.* from cudaapps a left join cudaapps b on (b.cudacode <>
a.cudacode and b.date_created >= a.date_created and b.current_status =
'PASSED' and row(b.subtest, b.os, b.arch) = row(a.subtest, a.os,
a.arch)) where a.current_status = 'FAILED' and b.id is null and
a.date_created >= now() - interval '24 hours';

but for reasons that aren't clear to me, I'm getting id 13 & 15
returned instead of 14 & 15, and I can't figure out why.

Separately, I tried to use Tom's advice to call DISTINCT ON, with the
following query:

SELECT DISTINCT ON(date_created, cudacode) id,current_status FROM
cudaapps WHERE (SELECT now() - interval '24 hours' <
date_created::timestamp)='t' ORDER BY date_created;

but that too is returning id 13, 14 & 15. If I strip out the
date_created in the DISTINCT ON and ORDER BY sections:
SELECT DISTINCT ON(cudacode) id,current_status FROM cudaapps WHERE
(SELECT now() - interval '24 hours' < date_created::timestamp)='t' ;

then I get id 13 & 15 all over again, rather than 14 & 15.

I'm guessing that I'm missing something really obvious here, but I
can't quite figure out what.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama(at)gmail(dot)com
LlamaLand http://netllama.linux-sxs.org

In response to

Re: analyzing query results at 2007-08-11 01:03:32 from Jon Sime

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Lonni J Friedman	2007-08-13 20:02:15	Re: analyzing query results
Previous Message	Tom Lane	2007-08-13 17:42:13	Re: resetting superuser password