Re: group by range of values

From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Carol Cheung" <cacheung(at)consumercontact(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: group by range of values
Date: 2007-07-27 20:43:03
Message-ID: 6F7EBD6A-A1FD-44A8-B4E6-EA415DFA3535@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

2007/7/27, Carol Cheung <cacheung(at)consumercontact(dot)com>:

> db=# select * from tester order by birth_year;
> birth_year | salary
> ------------+--------
> 1946 | 78000
> 1949 | 61000

What is the data type of the birth_year column? I'd suggest using
date if you can, as what it is is a date with year precision. You
can't specify such a precision, but you can decide that all
birth_year's will have month and year of January 1 (enforced by a
CHECK constraint, if you wish), or you could just choose to ignore
the month and year part in your calculations.

> How can I display the average salary grouped by decade of birth year?
> That is, is it possible to display the average salary of those born in
> the 1940's, the average salary of those born in the 1950's, average
> salary of those born in the 1960's, and those born in the 1970's,
> all in
> one result table?
> Something like:
>
> decade | average(salary)
> -------+-----------------
> 1940 | 69500
> 1950 | 53333.33
> 1960 | 53000
> 1970 | 40333.33

Here's an example:

CREATE TABLE salaries (birth_year DATE PRIMARY KEY, salary NUMERIC
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES
('1946-01-01',78000), ('1949-01-01',61000), ('1951-01-01',58000)
, ('1953-01-01',56000), ('1958-01-01',52000), ('1962-01-01',50000)
, ('1965-01-01',45000), ('1967-01-01',60000), ('1968-01-01',57000)
, ('1970-01-01',47000), ('1972-01-01',32000), ('1973-01-01',42000);

SELECT birth_decade, AVG(salary)
FROM (
SELECT birth_year
, date_trunc('decade', birth_year)::date as birth_decade
, salary
FROM salaries) as salaries_with_decades
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade | avg
--------------+--------------------
1940-01-01 | 69500.000000000000
1950-01-01 | 55333.333333333333
1960-01-01 | 53000.000000000000
1970-01-01 | 40333.333333333333
(4 rows)

If birth_year is an integer column, here's another way to do it,
taking advantage of the fact that integer division truncates.

CREATE TABLE salaries (birth_year INTEGER PRIMARY KEY, salary NUMERIC
NOT NULL);

INSERT INTO salaries (birth_year, salary) VALUES
(1946,78000), (1949,61000), (1951,58000), (1953,56000),
(1958,52000)
, (1962,50000), (1965,45000), (1967,60000), (1968,57000),
(1970,47000)
, (1972,32000), (1973,42000);

SELECT birth_decade, AVG(salary)
FROM (
SELECT birth_year
, birth_year / 10 * 10 as birth_decade
, salary
FROM salaries) as salaries_with_decades
GROUP BY birth_decade
ORDER BY birth_decade;
birth_decade | avg
--------------+--------------------
1940 | 69500.000000000000
1950 | 55333.333333333333
1960 | 53000.000000000000
1970 | 40333.333333333333
(4 rows)

Hope this gives you some options.

Michael Glaesemann
grzm seespotcode net

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Marcin Krawczyk 2007-07-28 20:54:21 raise exception and transaction handling
Previous Message Rodrigo De León 2007-07-27 19:31:32 Re: group by range of values