More of a SQL question, I guess.

From: Howard Eglowstein <howard(at)yankeescientific(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: More of a SQL question, I guess.
Date: 2007-08-13 20:41:59
Message-ID: 46C0C217.4080908@yankeescientific.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I have a database with 150 fields per row. For historical reasons, it's
broken into three databases, data_a, data_b and data_c. One of the
fields is a timestamp and one of them is a serial number that came in
from a remote machine that reports its status every day.

If I use a command like 'SELECT MAX(Timestamp) FROM data_a WHERE field1
= '0001', I can get back the time of the latest report. I can then issue
the command 'SELECT * from data_a, data_b, data_c WHERE
data_a.id=data_b.id AND data_a.id=data_c.id and field1='0001' to get the
150 fields for that report. It works fine, but it takes a while to
respond when the database is the better part of a million records.

If it were just one serial number, the two queries would be okay because
It really only takes a minute. The problem I have is that we're hoping
to have thousands of machine in the field (this is a just a test
database) and clearly this approach won't work for thousands of serial
numbers. The second method I did was to simply use 'SE:ECT * from
data_a.....' to get all million records, and have my C code look for
serial numbers in each line and keep the latest by timestamp. That takes
about as long as doing the first procedure 3 times, but it gives me the
latest data for all of the serial numbers in the system. That's
perfectly cool, except that it won't scale nicely. If the web code that
does the search isn't on the same machine that holds the data, we'll
have to ship gigabytes of data over the network for each search.

What I'd *like* is something that uses groups and MAX() to do this in
one SQL command. It would group the data by the serial number (field1),
find the record in each group with the maximum timestamp value and
return all 150 fields. Is this possible? I thought of using unions, but
I think I have to issue a pretty long command for each group and the PG
buffers probably will max out after some relatively small number, no?

Any thoughts would be appreciated.

Howard

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2007-08-13 20:42:22 Re: analyzing query results
Previous Message Lonni J Friedman 2007-08-13 20:02:15 Re: analyzing query results