Re: [HACKERS] No: implied sort with group by

From: darrenk(at)insightdist(dot)com (Darren King)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] No: implied sort with group by
Date: 1998-01-27 17:08:45
Message-ID: 9801271708.AA21938@ceodev
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > > This is what I think is missing or broken right now.
> > >
> > > > > select * from t1;
> > > > a b c
> > > > 1 x
> > > > 2 x
> > > > 3 z
> > > > 2 x
> > > >
> > > > 4 row(s) retrieved.
> > > > > select b,c,sum(a) from t1 group by b,c;
> > > > b c (sum)
> > > >
> > > > x 5
> > > > z 3
> > > >> 2 row(s) retrieved.
> >
> > Sorry, I've lost the thread. What is broken? I get this same result, and
> > (assuming that column "b" is full of nulls) I think this the correct result.
>
> At one point, it was thought that NULLs shouldn't be grouped, but I
> backed out the patch. There is a problem with GROUP BY on large
> datasets, and Vadim knows the cause, and will work on it later.

Different from the grouping by NULLs issue...

The above results are from Sybase. If these same four rows are inserted into
postgres, the second query will return three rows. Something like...

b|c|sum(a)
|x|3
|z|3
|x|2

It does this not because of the null values of column b, but because the data is
not sorted before getting to the group by node if the user does not explicitly put
an order by in the query. IMHO, postgres should put an arbitrary sort node in the
tree so that the data can be properly grouped as the group node iterates over it.

And even if I put an "order by c" clause in there, I still get three rows, they're
just properly sorted. :)

darrenk

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-01-27 18:23:36 deadlock detection
Previous Message Thomas G. Lockhart 1998-01-27 17:03:31 Re: [HACKERS] Re: Re: [PORTS] the 'money' type