Re: Setting WHERE on a VIEW with aggregate function.

From: "Roger Hand" <RHand(at)kailea(dot)com>
To: "Bill Moseley" <moseley(at)hank(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Setting WHERE on a VIEW with aggregate function.
Date: 2005-09-16 19:06:19
Message-ID: DB28E9B548192448A4E8C8A3C1B1E475611CF5@sj1-exch-01.us.corp.kailea.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a view to generate a list of instructors and a count of their
> future classes.
>
> "instructors" is a link table between "class" and "person".
>
> CREATE VIEW future_instructor_counts
> AS
> SELECT person.id AS person_id,
> first_name,
> last_name,
> count(instructors.class) AS class_count
>
> FROM class, instructors, person
>
> WHERE class.id = instructors.class AND
> person.id = instructors.person
> AND class_time > now()
>
> GROUP BY person_id, first_name, last_name;

The trick is to do the data aggregation separately, then JOIN in whatever other fields you want.

Something like this:

CREATE VIEW future_instructor_counts
AS
SELECT * FROM

(SELECT
person.id AS person_id,
first_name,
last_name) personinfo

INNER JOIN

(SELECT class.id FROM class
WHERE class_time > now() ) classes

INNER JOIN

(SELECT
id, count(class) AS class_count
FROM instructors GROUP BY id) classcount

ON personinfo.person_id = instructors.id
AND classes.id = instructors.id

In many cases when using aggregate functions you get just the fields you need from the agg function (typically an id plus the aggregate result) and JOIN with other tables (or even the same table) to get other info such as first_name, last_name, etc.

Otherwise, if you GROUP BY additional fields so you can get them in the output, you may be making the db do additional work.

> 1) With an aggregate function in the query, is there any way to remove
> the "AND class_time > now()" so that timestamp can be passed in the
> select? That is, I'd like to be able to do this?
>
> select * from instructor_counts where class_time > now();
>
> But class_time is not part of the VIEW so that's not valid.

No problem, just make it a part of the view. See the classes section below.

CREATE VIEW future_instructor_counts
AS
SELECT * FROM

(SELECT
person.id AS person_id,
first_name,
last_name) personinfo

INNER JOIN

-- Add class_time field!
(SELECT class.id, class_time FROM class
WHERE class_time > now() ) classes

INNER JOIN

(SELECT
id, count(class) AS class_count
FROM instructors GROUP BY id) classcount

ON personinfo.person_id = instructors.id
AND classes.id = instructors.id

[Disclaimer: I've not tested this code at all. It could help if you sent table definitions and maybe even dummy
data via insert commands.]

> And if it was included then I don't have an aggregate function any more - no
> more grouping.

If you do the agg function separately like this that isn't an issue. You join tables to get whatever fields you'd like to have in your output.

> 2) I think I'm missing something obvious. I know that I need to
> specify all my non-aggregate columns in the "GROUP BY", but I don't
> under stand why. Really, the results are just grouped only by
> person.id so why the need to specify the other columns.
>
> And if you don't specify all the columns then Postgresql reports:
>
> ERROR: column "person.id" must appear in the GROUP BY
> clause or be used in an aggregate function
>
> Is there a reason Postgresql doesn't just add the column
> automatically? It does in other cases (like a missing table in a
> join).

As I mention above, if you GROUP BY additional fields just to get them in the output, you may be making the db do additional work.

I seem to remember that in a later SQL standard (ie, after SQL-99 but I could be wrong) I believe it allows you to specify additional fields in SELECT that are not in the GROUP BY clause. But PG isn't there yet.

-Roger

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org]On Behalf Of Bill Moseley
Sent: Friday, September 16, 2005 11:30 AM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Setting WHERE on a VIEW with aggregate function.

I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
AS
SELECT person.id AS person_id,
first_name,
last_name,
count(instructors.class) AS class_count

FROM class, instructors, person

WHERE class.id = instructors.class AND
person.id = instructors.person
AND class_time > now()

GROUP BY person_id, first_name, last_name;

I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select? That is, I'd like to be able to do this?

select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid. And if it
was included then I don't have an aggregate function any more - no
more grouping.

2) I think I'm missing something obvious. I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why. Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

ERROR: column "person.id" must appear in the GROUP BY
clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically? It does in other cases (like a missing table in a
join).

Thanks

--
Bill Moseley
moseley(at)hank(dot)org

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Responses

Browse pgsql-general by date

  From Date Subject
Next Message cbeahm 2005-09-16 19:12:04 Contraint Trigger Permissions
Previous Message Greg Stark 2005-09-16 18:39:14 Re: Asychronous database replication