Skip site navigation (1) Skip section navigation (2)

Re: More of a SQL question, I guess.

From: Frank Bax <fbax(at)sympatico(dot)ca>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: More of a SQL question, I guess.
Date: 2007-08-14 14:36:26
Message-ID: 46C1BDEA.6050209@sympatico.ca (view raw or flat)
Thread:
Lists: pgsql-novice
Howard Eglowstein wrote:
> I have a database with 150 fields per row. For historical reasons, it's 
> broken into three databases, data_a, data_b and data_c. One of the 
> fields is a timestamp and one of them is a serial number that came in 
> from a remote machine that reports its status every day.
> 
> If I use a command like 'SELECT MAX(Timestamp) FROM data_a WHERE field1 
> = '0001', I can get back the time of the latest report. I can then issue 
> the command 'SELECT * from data_a, data_b, data_c WHERE 
> data_a.id=data_b.id AND data_a.id=data_c.id and field1='0001' to get the 
> 150 fields for that report. It works fine, but it takes a while to 
> respond when the database is the better part of a million records.
> 
> If it were just one serial number, the two queries would be okay because 
> It really only takes a minute. The problem I have is that we're hoping 
> to have thousands of machine in the field (this is a just a test 
> database) and clearly this approach won't work for thousands of serial 
> numbers. The second method I did was to simply use 'SE:ECT * from 
> data_a.....' to get all million records, and have my C code look for 
> serial numbers in each line and keep the latest by timestamp. That takes 
> about as long as doing the first procedure 3 times, but it gives me the 
> latest data for all of the serial numbers in the system. That's 
> perfectly cool, except that it won't scale nicely. If the web code that 
> does the search isn't on the same machine that holds the data, we'll 
> have to ship gigabytes of data over the network for each search.
> 
> What I'd *like* is something that uses groups and MAX() to do this in 
> one SQL command. It would group the data by the serial number (field1), 
> find the record in each group with the maximum timestamp value and 
> return all 150 fields. Is this possible?  I thought of using unions, but 
> I think I have to issue a pretty long command for each group and the PG 
> buffers probably will max out after some relatively small number, no?
> 
> Any thoughts would be appreciated.


Start with

SELECT field1,MAX(timestamp) as ts FROM data_a GROUP BY field1

to get a timestamp for each value of field1; then use self join to get
the serial number from these rows.  At this point you have a problem if
there are any duplicate timestamp values.

SELECT id, a.f1, ts from data_a join (select f,max(timestamp) as ts from
data_a group by f1) as a on a.ts=data_a.timestamp;

Call this select SQL1 and join itto your longer sql statement:

SELECT * from data_a, data_b, data_c
JOIN (SQL1) as aa on aa.id = data_a.id
WHERE data_a.id=data_b.id AND data_a.id=data_c.id



It seems to me the underlying question here comes up often.  Might I
suggest that we add a new question to section 4 of FAQ:
Q: How can I retrieve other fields in same row as result returned by an
aggregate function such as max()?
A:  Is my "SQL1" the best/only answer to this question?


In response to

Responses

pgsql-novice by date

Next:From: Jasbinder Singh BaliDate: 2007-08-14 14:49:59
Subject: Re: Postmaster start up problems (can't create lock file )
Previous:From: Patrick LindemanDate: 2007-08-14 09:56:05
Subject: Re: Postmaster start up problems (can't create lock file )

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group