Re: Hash support for grouping sets

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash support for grouping sets
Date: 2017-01-17 16:59:36
Message-ID: CA+TgmoY9=yeTft4gKxCZyn4X-nwJmWUd72Og-CZLYS0ki0E0pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 16, 2017 at 10:59 AM, Finnerty, Jim <jfinnert(at)amazon(dot)com> wrote:
> The ability to exploit hashed aggregation within sorted groups, when the order of the input stream can be exploited this way, is potentially a useful way to improve aggregation performance more generally. This would potentially be beneficial when the input size is expected to be larger than the amount of working memory available for hashed aggregation, but where there is enough memory to hash-aggregate just the unsorted grouping key combinations, and when the cumulative cost of rebuilding the hash table for each sorted subgroup is less than the cost of sorting the entire input. In other words, if most of the grouping key combinations are already segregated by virtue of the input order, then hashing the remaining combinations within each sorted group might be done in memory, at the cost of rebuilding the hash table for each sorted subgroup.

Neat idea.

> I haven’t looked at the code for this change yet (I hope I will have the time to do that). Ideally the decision to choose the aggregation method as sorted, hashed, or mixed hash/sort should be integrated into the cost model, but given the notorious difficulty of estimating intermediate cardinalities accurately it would be difficult to develop a cardinality model and a cost model accurate enough to choose among these options consistently well.

Yes, that might be a little tricky.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-17 17:05:04 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Previous Message Robert Haas 2017-01-17 16:54:57 Re: Declarative partitioning - another take