Re: [NOVICE] Question About Aggregate Functions

From: "Don Parris" <parrisdc(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [NOVICE] Question About Aggregate Functions
Date: 2006-09-13 16:49:34
Message-ID: 1eba300b0609130949q12f401c4kad1a9fc65d80431@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-novice

On 9/13/06, Brandon Aiken <BAiken(at)winemantech(dot)com> wrote:
>
> Ah, I did not know what was in your fields, so I did not assume they were
> Boolean values. It looked to me like you were trying to use IS TRUE to
> substitute for the lack of a GROUP BY, so I didn't know what to do.
>

That was in the first paragraph of my OP. "How do I create a query that
(1) evaluates each boolean field for TRUE/FALSE and (2) counts the number of
rows where each field is TRUE?" Maybe you just hadn't had your first cup of
coffee? ;-) Seriously, though, I really do appreciate your help.

Yes, count() will include all non-NULL values. Sorry if I sounded unclear
> there.
>
>
>
> If you do typecasting the value zero is false (and non-zero is true).
> NULL in an expression always returns NULL, and many programs will interpret
> that result as false. So I'm not sure of what results you might get with a
> Boolean test against a non-Boolean field, especially if it's an integer
> field.
>
>
>
> postgres=# select 0::boolean = FALSE;
>
> ?column?
>
> ----------
>
> t
>
> (1 row)
>
>
>
> You should just be able to take the previous query and add in your WHERE
> clauses:
>
>
>
> SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc
> )
>
> FROM t1 JOIN t2 ON t1.item_id = t2.item_id
>
> WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE
>
> GROUP BY NULL;
>
>
>
> Now, the INNER JOIN you're using is only selecting fields where both
> t1.item_id and t2.item_id exist and the respective fields are TRUE. That
> is, it's only going to run the count and average functions against the
> results of this query:
>
> SELECT *
>
> FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
>
> WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE;
>
>
>
> If that's what you want, that's great.
>

Can I use OR instead of AND here?

However, you might want a count of each field where that field is TRUE. In
> that case, I would use either temporary tables, compound queries and derived
> tables, or multiple simple queries.
>

It's also possible that you might want a count of fields where t1.item_idand
> t2.item_id exist, but where only each respective field is TRUE. That is,
> you want a count of t1.fielda where it is TRUE no matter what t2.fieldaand
> t2.fieldb are as long as t1.item_id matches t2.item_id. In that case you
> have to do even more joins, and that could take a fair bit of time
> especially if you haven't indexed your item_id fields.
>
Well this sounds more like what I want. Given t2.fielda, t2.fieldb,
t2.fieldc, any one (or all three) could be true, but frequently at least one
of the fields is false. Initially, all of the fields might be unknown (thus
NULL) for a given item until I am able to investigate the items to determine
TRUE/FALSE. I frequently have items that are inactive, and thus unable to
determine any of attributes in t2.

My end result needs to be a count of all the values in each field where the
value is TRUE, as opposed to FALSE or NULL.

> You really have to look at your result sets. Sometimes it is better to
> run multiple simple queries instead of one big complex query to be sure
> you're getting the data you want and the query executes in a reasonable
> amount of time.
>
>
>
> Also, consider that NULL values are generally considered bad to
> purposefully enter. Logically, It would be better to create one table for
> each field and then create a record for each item_id as you need it so you
> never have NULLs. The problem with that is one of performance if you end up
> doing large number of JOINs. In that case, it might be better to use
> integers instead of Boolean fields, since you have three explicit states of
> TRUE, FALSE, and NOT YET DETERMINED.
>
Regarding the NULL Values:
I have been thinking that I might want to leave my NULLs as they are.
However, I will be concentrating mostly on the items that are active.
Inactive items are only counted as part of the total number of items. Their
attributes are a moot point, but technically FALSE. (If they are inactive,
the attributes are no longer TRUE in any case.)

I am counting only those fields that I know (through verification) to be
TRUE. I can use FALSE where the actual value is not known, and just change
the attribute to TRUE when I discover that to be the case. I just need to
be sure in my counts that I note the fact that FALSE values include the
unverified values (meaning some of those might actually be true). Does that
sound fairly logical to you?

Regarding the Table Layout:
The columns in t2 are ordered, essentially according to the category of
attributes involved. I had thought about using multiple tables, one for
each category of attributes. However, I chose a monolithic table for the
attributes to represent a single survey of each item. Each item might be
surveyed again in the future to determine any changes, which would introduce
a new version of the current table.

I'll tinker around with the queries a bit, and see what I come up with.

Thanks for the input.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brandon Aiken 2006-09-13 18:23:36 Re: [NOVICE] Question About Aggregate Functions
Previous Message Andrus 2006-09-13 16:13:56 Garbage data sent by Windows 98 client

Browse pgsql-novice by date

  From Date Subject
Next Message Brandon Aiken 2006-09-13 18:23:36 Re: [NOVICE] Question About Aggregate Functions
Previous Message Lan Barnes 2006-09-13 16:28:59 Re: pg_dump password