Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Олег Царев <zabivator(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date: 2009-05-12 13:27:22
Message-ID: 603c8f070905120627g5446fe09wfd1300c49f5d8fef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 12, 2009 at 2:21 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Moreover, I guess you don't even need to buffer tuples to aggregate by
>> different keys. What you have to do is only to prepare more than one
>> hash tables (, or set up sort order if the plan detects hash table is
>> too large to fit in the memory), and one time seq scan will do. The
>> trans values are only to be stored in the memory, not the outer plan's
>> results. It will win greately in performance.
>
> it was my first solution. But I would to prepare one non hash method.
> But now I thinking about some special executor node, that fill all
> necessary hash parallel. It's special variant of hash agreggate.

I think HashAggregate will often be the fastest method of executing
this kind of operation, but it would be nice to have an alternative
(such as repeatedly sorting a tuplestore) to handle non-hashable
datatypes or cases where the HashAggregate would eat too much memory.

But that leads me to a question - does the existing HashAggregate code
make any attempt to obey work_mem? I know that the infrastructure is
present for HashJoin/Hash, but on a quick pass I didn't notice
anything similar in HashAggregate.

And on a slightly off-topic note for this thread, is there any
compelling reason why we have at least three different hash
implementations in the executor? HashJoin/Hash uses one for regular
batches and one for the skew batch, and I believe that HashAggregate
does something else entirely. It seems like it might improve code
maintainability, if nothing else, to unify these to the extent
possible.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-05-12 14:34:10 Re: New trigger option of pg_standby
Previous Message Robert Haas 2009-05-12 13:19:30 Re: DROP TABLE vs inheritance