Re: [HACKERS] No: implied sort with group by

From: darrenk(at)insightdist(dot)com (Darren King)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] No: implied sort with group by
Date: 1998-01-28 17:52:18
Message-ID: 9801281752.AA75318@ceodev
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> postgres=> select b,c,sum(a) from t1 group by b,c;
> b|c|sum
> -+-+---
> |x| 5
> |z| 3
> |x| 0
> (3 rows)
>
> postgres=> select * from t1;
> a|b|c
> -+-+-
> 1| |x
> 2| |x
> 2| |x
> 3| |z
> 0| |x
> (5 rows)
>
> I just inserted a single out-of-order row at the end of the table which, since the
> integer value is zero, should have not affected the result. Sorry I didn't understand
> the nature of the test case.

The order of the implicit sort would be arbitrary, but should first sort on
any fields in a given ORDER BY to help speed things up later in the tree.

What are the effects of sorted or partially sorted input data to the sort code?

The current group/aggregate code seems to just loop over the tuples as they are.

I see two ways to fix the above, one w/minimal code, second w/more work, but
potentially better speed for large queries.

1. Put a sort node immediately before the group node, taking into account
any user given ordering. Also make sure the optimizer is aware of this sort
when calculating query costs.

2. Instead of sorting the tuples before grouping, add a hashing system to
the group node so that the pre-sorting is not necessary.

Hmmm...is this a grouping problem or an aggregate problem? Or both? The first
query above should have the data sorted before aggregating, shouldn't it, or I
am still missing a piece of this puzzle?

darrenk

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ocie 1998-01-28 18:42:54 Re: [HACKERS] No: implied sort with group by
Previous Message Simon Shapiro 1998-01-28 16:50:51 Re: [HACKERS] Domain Problem?