Re: aggregate query

From: Andrew Kroeger <andrew(at)sprocks(dot)gotdns(dot)com>
To: Raj A <raj(dot)ayappan(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: aggregate query
Date: 2007-05-29 08:04:47
Message-ID: 465BDE9F.4010605@sprocks.gotdns.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Raj A wrote:
> I have a table
>
> CREATE TABLE survey_load
> (
> meter_id character(5) NOT NULL,
> number_of_bays integer NOT NULL,
> bay_1_use integer,
> bay_2_use integer,
> bay_3_use integer,
> bay_4_use integer,
> bay_5_use integer,
> date date NOT NULL,
> inspection_id integer NOT NULL DEFAULT,
> )
>
> How do i present an aggregate query
>
> inspection_id | meter_id | bay_use
> 1 12345 (value of bay_1_use)
> 1 12345 (value of bay_2_use)
> 1 12345 (value of bay_3_use)
> 2 23456 (value of bay_1_use)
> 2 23456 (value of bay_2_use)
> 2 23456 (value of bay_3_use)
> 2 23456 (value of bay_4_use)
> 2 23456 (value of bay_5_use)

Firstly, the table definition you provided has an invalid specification
for the inspection_id column. For my examples, I simply used a "NOT
NULL" constraint, and did not implement any DEFAULT clause.

If I understand your issue correctly, it seems like the denormalized
nature of your table is causing you some problems. With the table
definition and output you provided, I am assuming the following contents
for the survey_load table:

meter_id | number_of_bays | bay_1_use | bay_2_use | bay_3_use |
bay_4_use | bay_5_use | date | inspection_id
----------+----------------+-----------+-----------+-----------+-----------+-----------+------------+---------------
12345 | 3 | 11 | 12 | 13 |
| | 2007-05-29 | 1
23456 | 5 | 21 | 22 | 23 |
24 | 25 | 2007-05-29 | 2
(2 rows)

The value of the number_of_bays column is driving the number of results
in your summary query, as opposed to allowing the data relations to
drive that on their own with a normalized representation. You could
normalize your data representation into 2 tables:

tmp1=> \d+ new_survey_load
Table "new_survey_load"
Column | Type | Modifiers | Description
---------------+--------------+-----------+-------------
survey_id | integer | not null |
meter_id | character(5) | not null |
date | date | not null |
inspection_id | integer | not null |
Indexes:
"new_survey_load_pkey" PRIMARY KEY, btree (survey_id)
Has OIDs: no

tmp1=> \d+ new_bay_use
Table "new_bay_use"
Column | Type | Modifiers | Description
------------+---------+-----------+-------------
survey_id | integer | not null |
bay_number | integer | not null |
bay_use | integer | not null |
Indexes:
"ind_new_bay_use__survey_id" btree (survey_id)
Foreign-key constraints:
"new_bay_use_survey_id_fkey" FOREIGN KEY (survey_id) REFERENCES
new_survey_load(survey_id)
Has OIDs: no

This normalization also makes it easier in the future to track
additional bays without having to alter the table structure.

Following your example from above, those tables would contain the
following values:

tmp1=> select * from new_survey_load ;
survey_id | meter_id | date | inspection_id
-----------+----------+------------+---------------
13 | 12345 | 2007-05-29 | 1
14 | 23456 | 2007-05-29 | 2
(2 rows)

tmp1=> select * from new_bay_use ;
survey_id | bay_number | bay_use
-----------+------------+---------
13 | 1 | 11
13 | 2 | 12
13 | 3 | 13
14 | 1 | 21
14 | 2 | 22
14 | 3 | 23
14 | 4 | 24
14 | 5 | 25
(8 rows)

The query to get the summary results you are looking for would then be:

select
l.inspection_id,
l.meter_id,
u.bay_use
from
new_survey_load l
join new_bay_use u on
l.survey_id = u.survey_id
order by
l.inspection_id, l.meter_id, u.bay_number;

You could also use a view to join the 2 new tables together to match the
format of your original table definition:

select
l.meter_id,
count(u.bay_number) as number_of_bays,
u1.bay_use as bay_1_use,
u2.bay_use as bay_2_use,
u3.bay_use as bay_3_use,
u4.bay_use as bay_4_use,
u5.bay_use as bay_5_use,
l.date,
l.inspection_id
from
new_survey_load l
join new_bay_use u on
l.survey_id = u.survey_id
left join new_bay_use u1 on
l.survey_id = u1.survey_id and u1.bay_number = 1
left join new_bay_use u2 on
l.survey_id = u2.survey_id and u2.bay_number = 2
left join new_bay_use u3 on
l.survey_id = u3.survey_id and u3.bay_number = 3
left join new_bay_use u4 on
l.survey_id = u4.survey_id and u4.bay_number = 4
left join new_bay_use u5 on
l.survey_id = u5.survey_id and u5.bay_number = 5
group by
1,3,4,5,6,7,8,9;

If you add more bays for tracking, you can simply add additional joins
against new_bay_use to mimic what you would have had in your original
table structure.

Hope this helps.

Andrew

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Gregory Stark 2007-05-29 10:50:29 Re: aggregate query
Previous Message Raj A 2007-05-29 04:32:32 aggregate query