Re: pg, mysql comparison with "group by" clause

From: Chris Travers <chris(at)travelamericas(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Anthony Molinaro <amolinaro(at)wgen(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Scott Marlowe <smarlowe(at)g2switchworks(dot)com>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Rick Schumeyer <rschumeyer(at)ieee(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: pg, mysql comparison with "group by" clause
Date: 2005-10-20 18:31:40
Message-ID: 4357E28C.3070401@travelamericas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Greg Stark wrote:

>"Anthony Molinaro" <amolinaro(at)wgen(dot)net> writes:
>
>
>
>>Greg,
>> You'll have to pardon me...
>>
>>I saw this comment:
>>
>>"I don't see why you think people stumble on this by accident.
>>I think it's actually an extremely common need."
>>
>>Which, if referring to the ability to have items in the select that do not
>>need to be included in the group, (excluding constants and the like) is just
>>silly.
>>
>>
>
>Well the "constants and the like" are precisely the point. There are plenty of
>cases where adding the column to the GROUP BY is unnecessary and since
>Postgres makes no attempt to prune them out, inefficient. And constants aren't
>the only such case. The most common case is columns that are coming from a
>table where the primary key is already included in the GROUP BY list.
>
>
I sort of see what you are saying.... but you have yet to convince me....

>In the case of columns coming from a table where the primary key is already in
>the GROUP BY list it's possible for the database to deduce that it's
>unnecessary to group on that column.
>
>
Well.... The question is really whether two things should be true:
1) whether you want to assume that the programmer is going to know
about Single/Multi Value Dependency issues per column. IMO, this is
more of a DB design issue than a client app issue. And I would *not*
want to make that assumption because for higher normal forms where this
is likely to be a consideration, you are likely to have denormalized
access via VIEWs anyway.

2) Whether you are willing to rely on looking at the data first to
determine whether the query is valid

Alternatively we are back to the ability to get the wrong answer with
ease and in very difficult to debug ways. I suspect that MySQL places
an implicit MIN() around columns not included in the group by
statement. I fail to see why this is not an appropriate answer to his
concern.

>But it's also possible to have cases where the programmer has out of band
>knowledge that it's unnecessary but the database doesn't have that knowledge.
>The most obvious case that comes to mind is a denormalized data model that
>includes a redundant column.
>
> select dept_id, dept_name, count(*) from employee_list
>
>
Ok. You have a few choices:
SELECT MIN(dept_id), dept_name, count(*) FROM employee_list GROUP BY
dept_name;
SELECT dept_id, dept_name, count(*) FROM employee_list GROUP BY
dept_name, dept_id;
SELECT dept_id, MIN(dept_name), count(*) FROM employee_list GROUP BY
dept_id;

And yes, it is bad design in every case I can think of.//
Why is this a problem?

Best Wishes,
Chris Travers
Metatron Technology Consulting

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Doug Quale 2005-10-20 20:29:22 Re: [pgsql-advocacy] Oracle buys Innobase
Previous Message Bryce Nesbitt (mailing list account) 2005-10-20 15:58:53 Re: casting character varying to integer - order by numeric