Re: pg, mysql comparison with "group by" clause

From: "Anthony Molinaro" <amolinaro(at)wgen(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "Rick Schumeyer" <rschumeyer(at)ieee(dot)org>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: pg, mysql comparison with "group by" clause
Date: 2005-10-13 03:20:53
Message-ID: 3C6C2B281FD3E74C9F7C9D5B1EDA4582182613@wgexch01.wgenhq.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Tom,
Don't do it man...
It's not the same. The standard has it wrong (that or I misunderstood
your
explanation). But this statement:

> you shouldn't have to explicitly list the other columns from that
> table --- they are guaranteed to have unique values per group anyway

...is way off.
By changing the values in the select/group by you are changing
Group! How can you arbitrarily add or exclude a column?
You can't do it.

Look, all it takes is one example to prove the standard wrong...

Consider this simple example (in oracle, but the same in pg):

create table foo (id number primary key, name varchar2(10));

insert into foo values (1,'joe');

insert into foo values (2,'joe');

insert into foo values (3,'jim');

select * from foo;

ID NAME
---------- ----------
1 joe
2 joe
3 jim

ok, lets get the count by ID (which is unique):

select id, count(*) from foo group by id;

ID COUNT(*)
-- ----------
1 1
2 1
3 1

makes sense...

the values for ID, 1,2,3 are our groups
and the count aggregate op shows that there's
one member per group.

now let's get the count by name:

select name, count(*) from foo group by name;

NAME COUNT(*)
---------- ----------
jim 1
joe 2

so far so good. The queries and results are representative of the actual
data. The values for name, "jim" and "joe" are our groups
and the count aggregate op shows that there's 1 member in the "jim"
group and two members in the "joe" group.

But, as soon as we add id to the group by...

select name, count(*) from foo group by name,id;

NAME COUNT(*)
---------- ----------
jim 1
joe 1
joe 1

it changes the query! Even tho id is unique, it changes the query.
The group by it's definition (it's in the group by) is no longer
Name or id, it's both name and id.
How can you simply remove id? Which result set should it return
The first or second? Makes no sense...

If it's aint obvious why, simply plug id into the select:

select id, name, count(*) from foo group by name,id;

ID NAME COUNT(*)
--- ---------- ----------
3 jim 1
1 joe 1
2 joe 1

so, how can it be that you don't have to list the other columns
(in this case "name")? it makes no sense because if you remove the
either column from the group by (id or name) it changes the meaning of
the
query.
The way you guys do it now is correct. Mysql has it wrong.
And if the standard states that you can leave out the pk from a group
by,
They are wrong too, as the simple examples above prove.

Fyi, Oracle just bought innodb, so, I'd not be too concerned with mysql
and they so called "features" anyway.

Regards,
Anthony

-----Original Message-----
From: pgsql-sql-owner(at)postgresql(dot)org
[mailto:pgsql-sql-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: Wednesday, October 12, 2005 9:51 PM
To: Greg Stark
Cc: Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql(at)postgresql(dot)org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Greg Stark <gsstark(at)mit(dot)edu> writes:
> I think most MySQL users don't stumble on it, they learn it as the way
> to handle the common use case when you join a master table against a
> detail table and then want to aggregate all the detail records. In
> standard SQL you have to write GROUP BY ... and list every single
> column you need from the master table. Forcing the database to do a
> lot of redundant comparisons and sort on uselessly long keys where in
> fact you only really need it to sort and group by the primary key.

Actually, if you're grouping by a table's primary key, the SQL99 spec
says you shouldn't have to explicitly list the other columns from that
table --- they are guaranteed to have unique values per group anyway.
This is a single case in the "functional dependency" stuff. That
verbiage is incredibly dense and I don't think we want to tackle all of
it any time soon, but the primary-key case probably wouldn't be very
hard to implement. We really ought to have this in TODO ... I'm sure
it's been discussed before.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Greg Stark 2005-10-13 03:43:14 Re: pg, mysql comparison with "group by" clause
Previous Message Michael Glaesemann 2005-10-13 03:07:47 Re: Update timestamp on update