Re: Hash grouping, aggregates

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruno Wolff III <bruno(at)wolff(dot)to>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash grouping, aggregates
Date: 2003-02-11 20:21:26
Message-ID: 1044994885.1607.5.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane kirjutas T, 11.02.2003 kell 18:39:
> Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Greg Stark <gsstark(at)mit(dot)edu> writes:
> >>> The neat thing is that hash aggregates would allow grouping on data types that
> >>> have = operators but no useful < operator.
> >>
> >> Hm. Right now I think that would barf on you, because the parser wants
> >> to find the '<' operator to label the grouping column with, even if the
> >> planner later decides not to use it. It'd take some redesign of the
> >> query data structure (specifically SortClause/GroupClause) to avoid that.
>
> > I think another issue is that for some = operators you still might not
> > be able to use a hash. I would expect the discussion for hash joins in
> > http://developer.postgresql.org/docs/postgres/xoper-optimization.html
> > would to hash aggregates as well.
>
> Right, the = operator must be hashable or you're out of luck. But we
> could imagine tweaking the parser to allow GROUP BY if it finds a
> hashable = operator and no sort operator. The only objection I can see
> to this is that it means the planner *must* use hash aggregation, which
> might be a bad move if there are too many distinct groups.

If we run out of sort memory, we can always bail out later, preferrably
with a descriptive error message. It is not as elegant as erring out at
parse (or even plan/optimise) time, but the result is /almost/ the same.

Relying on hash aggregation will become essential if we are ever going
to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would
be nice if hash aggregation could also overflow to disk - I suspect that
this will still be faster that running an independent scan for each
GROUP BY grouping and merging the results.

-----
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2003-02-11 20:36:52 FW: Changing the default configuration (was Re:
Previous Message scott.marlowe 2003-02-11 20:10:17 Re: Changing the default configuration (was Re: