Skip site navigation (1) Skip section navigation (2)

Hash support for grouping sets

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Hash support for grouping sets
Date: 2017-01-06 03:48:58
Message-ID: 87vatszyhj.fsf@news-spur.riddles.org.uk (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
Herewith a patch for doing grouping sets via hashing or mixed hashing
and sorting.

The principal objective is to pick whatever combination of grouping sets
has an estimated size that fits in work_mem, and minimizes the number of
sorting passes we need to do over the data, and hash those.  (Yes, this
is a knapsack problem.)

This is achieved by adding another strategy to the Agg node; AGG_MIXED
means that both hashing and (sorted or plain) grouping happens.  In
addition, support for multiple hash tables in AGG_HASHED mode is added.

Sample plans look like this:

explain select two,ten from onek group by cube(two,ten);
                          QUERY PLAN                          
--------------------------------------------------------------
 MixedAggregate  (cost=0.00..89.33 rows=33 width=8)
   Hash Key: two, ten
   Hash Key: two
   Hash Key: ten
   Group Key: ()
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=8)

explain select two,ten from onek group by two, cube(ten,twenty);
                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=86.50..100.62 rows=162 width=12)
   Hash Key: two, ten, twenty
   Hash Key: two, ten
   Hash Key: two
   Hash Key: two, twenty
   ->  Seq Scan on onek  (cost=0.00..79.00 rows=1000 width=12)

set work_mem='64kB';
explain select count(*) from tenk1
  group by grouping sets (unique1,thousand,hundred,ten,two);
                               QUERY PLAN                               
------------------------------------------------------------------------
 MixedAggregate  (cost=1535.39..3023.89 rows=11112 width=28)
   Hash Key: two
   Hash Key: ten
   Hash Key: hundred
   Group Key: unique1
   Sort Key: thousand
     Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=10000 width=20)
         Sort Key: unique1
         ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=20)
(10 rows)

Columns with unhashable or unsortable data types are handled
appropriately, though obviously every individual grouping set must end
up containing compatible column types.

There is one current weakness which I don't see a good solution for: the
planner code still has to pick a single value for group_pathkeys before
planning the input path.  This means that we sometimes can't choose a
minimal set of sorts, because we may have chosen a sort order for a
grouping set that should have been hashed, for example:

explain select count(*) from tenk1
  group by grouping sets (two,unique1,thousand,hundred,ten);
                               QUERY PLAN                               
------------------------------------------------------------------------
 MixedAggregate  (cost=1535.39..4126.28 rows=11112 width=28)
   Hash Key: ten
   Hash Key: hundred
   Group Key: two
   Sort Key: unique1
     Group Key: unique1
   Sort Key: thousand
     Group Key: thousand
   ->  Sort  (cost=1535.39..1560.39 rows=10000 width=20)
         Sort Key: two
         ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=20)

(3 sorts, vs. 2 in the previous example that listed the same grouping
sets in different order)

Current status of the work: probably still needs cleanup, maybe some
better regression tests, and Andres has expressed some concern over the
performance impact of additional complexity in advance_aggregates; it
would be useful if this could be tested.  But it should be reviewable
as-is.

-- 
Andrew (irc:RhodiumToad)


Attachment: gshash10.patch
Description: text/x-patch (140.7 KB)

Responses

pgsql-hackers by date

Next:From: Jim NasbyDate: 2017-01-06 03:50:52
Subject: Re: Faster methods for getting SPI results
Previous:From: Tsunakawa, TakayukiDate: 2017-01-06 03:01:57
Subject: Re: Supporting huge pages on Windows

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group