Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] No: implied sort with group by

From: darrenk(at)insightdist(dot)com (Darren King)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] No: implied sort with group by
Date: 1998-01-28 17:52:18
Message-ID: 9801281752.AA75318@ceodev (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
> postgres=> select b,c,sum(a) from t1 group by b,c;
> b|c|sum
> -+-+---
>  |x|  5
>  |z|  3
>  |x|  0
> (3 rows)
> postgres=> select * from t1;
> a|b|c
> -+-+-
> 1| |x
> 2| |x
> 2| |x
> 3| |z
> 0| |x
> (5 rows)
> I just inserted a single out-of-order row at the end of the table which, since the
> integer value is zero, should have not affected the result. Sorry I didn't understand
> the nature of the test case.

The order of the implicit sort would be arbitrary, but should first sort on
any fields in a given ORDER BY to help speed things up later in the tree.

What are the effects of sorted or partially sorted input data to the sort code?

The current group/aggregate code seems to just loop over the tuples as they are.

I see two ways to fix the above, one w/minimal code, second w/more work, but
potentially better speed for large queries.

1.  Put a sort node immediately before the group node, taking into account
any user given ordering.  Also make sure the optimizer is aware of this sort
when calculating query costs.

2.  Instead of sorting the tuples before grouping, add a hashing system to
the group node so that the pre-sorting is not necessary. this a grouping problem or an aggregate problem?  Or both?  The first
query above should have the data sorted before aggregating, shouldn't it, or I
am still missing a piece of this puzzle?



pgsql-hackers by date

Next:From: ocieDate: 1998-01-28 18:42:54
Subject: Re: [HACKERS] No: implied sort with group by
Previous:From: Simon ShapiroDate: 1998-01-28 16:50:51
Subject: Re: [HACKERS] Domain Problem?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group