Re: pg, mysql comparison with "group by" clause

From: "Anthony Molinaro" <amolinaro(at)wgen(dot)net>
To: "Greg Stark" <gsstark(at)mit(dot)edu>, "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>
Cc: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "Rick Schumeyer" <rschumeyer(at)ieee(dot)org>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: pg, mysql comparison with "group by" clause
Date: 2005-10-13 03:02:40
Message-ID: 3C6C2B281FD3E74C9F7C9D5B1EDA4582182612@wgexch01.wgenhq.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> I don't see why you think people stumble on this by accident. I think
it's
> actually an extremely common need.

I can't imagine how that's a common need at all.

It makes no sense.

When you add an additional column in the select, it must be included in
the group by as it changes the meaning of the query.

Consider:

select deptno, count(*) from emp group by deptno;

DEPTNO COUNT(*)
---------- ----------
10 3
20 5
30 6

the query above counts the number of employees in each department.

Now consider the following query:

select deptno,job,count(*) from emp group by deptno,job;

DEPTNO JOB COUNT(*)
---------- --------- ----------
10 CLERK 1
10 MANAGER 1
10 PRESIDENT 1
20 CLERK 2
20 ANALYST 2
20 MANAGER 1
30 CLERK 1
30 MANAGER 1
30 SALESMAN 4

the query above counts the number of different job types in each
department.

In mysql, you would be allowed to do the following:

select deptno,job,count(*) from emp group by deptno;

but it makes no sense. What value would it return and what does it mean?
How can that possibly represent reliable data? What would the result set
above look like? It would be meaningless in a production system.
honestly.

It's a silly bug that mysql has touted has a feature
and I can't imagine why people think it's useful.

> I think most MySQL users don't stumble on it, they learn it as the way
to
> handle the common use case when you join a master table against a
detail
> table and then want to aggregate all the detail records.

Huh? I don't follow that at all....
Perhaps your confusing the concept with window functions that neither pg
nor mysql have implemented yet?
For example, using window functions allows you to return aggregate and
detailed info simultaneously:

select ename,
deptno,
job,
count(*)over(partition by deptno) as emps_per_dept,
count(*)over(partition by deptno,job) as job_per_dept,
count(*)over() as total
from emp

ENAME DEPTNO JOB EMPS_PER_DEPT JOB_PER_DEPT TOTAL
------ ------ --------- ------------- ------------ -----
MILLER 10 CLERK 3 1 14
CLARK 10 MANAGER 3 1 14
KING 10 PRESIDENT 3 1 14
SCOTT 20 ANALYST 5 2 14
FORD 20 ANALYST 5 2 14
SMITH 20 CLERK 5 2 14
ADAMS 20 CLERK 5 2 14
JONES 20 MANAGER 5 1 14
JAMES 30 CLERK 6 1 14
BLAKE 30 MANAGER 6 1 14
ALLEN 30 SALESMAN 6 4 14
MARTIN 30 SALESMAN 6 4 14
TURNER 30 SALESMAN 6 4 14
WARD 30 SALESMAN 6 4 14

But this is not a group by, this is aggregating and windowing,
which is quite different from mysql adding that nasty little bug
and calling it a feature.

- a

-----Original Message-----
From: pgsql-sql-owner(at)postgresql(dot)org
[mailto:pgsql-sql-owner(at)postgresql(dot)org] On Behalf Of Greg Stark
Sent: Wednesday, October 12, 2005 9:13 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql(at)postgresql(dot)org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Scott Marlowe <smarlowe(at)g2switchworks(dot)com> writes:

> Hehe. When I turn on my windshield wipers and my airbag deploys, is
it
> a documented "feature" if the dealership told me about this behaviour
> ahead of time?

Well it's more like my car where the dashboard dims when I turn on my
headlights which annoys me to no end since I learned to always put my
headlights on even in the day.

> In much the same way, while this behaviour may be documented by MySQL,
I
> can't imagine it really being called a feature. But at least this
> misbehaviour is documented. However, I think most people in the MySQL
> universe just stumble onto it by accident when they try it and it
works. I'd
> at least prefer it to throw a warning or notice or something.

I don't see why you think people stumble on this by accident. I think
it's
actually an extremely common need. So common that Postgres has the same
feature (though less general) and invented a whole syntax to handle it.

I think most MySQL users don't stumble on it, they learn it as the way
to
handle the common use case when you join a master table against a detail
table
and then want to aggregate all the detail records. In standard SQL you
have to
write GROUP BY ... and list every single column you need from the master
table. Forcing the database to do a lot of redundant comparisons and
sort on
uselessly long keys where in fact you only really need it to sort and
group by
the primary key.

Remember, most MySQL users learn MySQL first, and only later learn what
is
standard SQL and what isn't.

> A Subselect would let you do such a thing as well, and while it's more
> complicated to write, it is likely to be easier to tell just what it's
> doing.

Subselects have their own problems here. Mainly Postgres's optimizer, as
good
as it is, doesn't treat them with the same code paths as joins and can't
find
all the same plans for them. But in any case you cannot always write a
subselect that's equivalent to an arbitrary join.

--
greg

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Browse pgsql-sql by date

  From Date Subject
Next Message Michael Glaesemann 2005-10-13 03:07:47 Re: Update timestamp on update
Previous Message Stewart Ben (RBAU/EQS4) * 2005-10-13 02:56:53 Re: Update timestamp on update