Re: pg, mysql comparison with "group by" clause

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Rick Schumeyer <rschumeyer(at)ieee(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: pg, mysql comparison with "group by" clause
Date: 2005-10-13 15:57:26
Message-ID: 1129219046.29961.184.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, 2005-10-12 at 20:13, Greg Stark wrote:
> Scott Marlowe <smarlowe(at)g2switchworks(dot)com> writes:
>
> > Hehe. When I turn on my windshield wipers and my airbag deploys, is it
> > a documented "feature" if the dealership told me about this behaviour
> > ahead of time?
>
> Well it's more like my car where the dashboard dims when I turn on my
> headlights which annoys me to no end since I learned to always put my
> headlights on even in the day.

Sorry, but it's worse than that. It is quite possible that two people
could run this query at the same time and get different data from the
same set and the same point in time. That shouldn't happen accidentally
in SQL, you should know it's coming.

But it's not the same as the air bag deploying, it's like a different
random part of my car activates / deactivates each time. The horn, a
turn indicator, the trunk opens.

> > In much the same way, while this behaviour may be documented by MySQL, I
> > can't imagine it really being called a feature. But at least this
> > misbehaviour is documented. However, I think most people in the MySQL
> > universe just stumble onto it by accident when they try it and it works. I'd
> > at least prefer it to throw a warning or notice or something.
>
> I don't see why you think people stumble on this by accident. I think it's
> actually an extremely common need. So common that Postgres has the same
> feature (though less general) and invented a whole syntax to handle it.

Because I answer a boatload of questions on phpbuilder, where there are
tons of MySQL and PostgreSQL beginners who learn by stumbling around in
their database of choice.

Most MySQL users think that the select a,b,c from table group by a is a
valid query, and don't even realize that they are getting theoretically
different results each time. It's one of those many things they learn
wrong on MySQL and have to unlearn everywhere else.

They didn't go looking for this behaviour, and almost none of them
realized when they were doing it that they could get a different answer
each time.

> I think most MySQL users don't stumble on it, they learn it as the way to
> handle the common use case when you join a master table against a detail table
> and then want to aggregate all the detail records.

This isn't really common sense all the time though. It's more about the
law of unintended consequences. People write these queries, and never
realize that they are actually random responses coming back. And if
they aren't random responses, then their data likely isn't normalized.

> In standard SQL you have to
> write GROUP BY ... and list every single column you need from the master
> table. Forcing the database to do a lot of redundant comparisons and sort on
> uselessly long keys where in fact you only really need it to sort and group by
> the primary key.

But again, you're getting whatever row the database feels like giving
you. A use of a simple, stupid aggregate like an any() aggregate would
be fine here, and wouldn't require a lot of overhead, and would meet the
SQL spec.

The real reason this thing exists today and not an any() aggregate or
some equivalent in MySQL is because of all the legacy code using the
messed up group by syntax. It's hard to change that kind of stuff when
you've got a lot of market share to hold on to.

> Remember, most MySQL users learn MySQL first, and only later learn what is
> standard SQL and what isn't.

Hehe. I'll never forget, remember, I answer questions on databases on
phpbuilder. I'm always amazed at how many folks come there who are just
starting out and making the same mistakes we all made when beginning.

> > A Subselect would let you do such a thing as well, and while it's more
> > complicated to write, it is likely to be easier to tell just what it's
> > doing.
>
> Subselects have their own problems here. Mainly Postgres's optimizer, as good
> as it is, doesn't treat them with the same code paths as joins and can't find
> all the same plans for them. But in any case you cannot always write a
> subselect that's equivalent to an arbitrary join.

Actually, for things like aggregates, I've often been able to improve
performance with sub selects in PostgreSQL. Although, back in the 7.2
days it was still pretty pokey at that kind of stuff.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Thomas F. O'Connell 2005-10-13 16:58:15 Re: Strange join...maybe some improvements???
Previous Message Shavonne Marietta Wijesinghe 2005-10-13 15:31:48