Quick Links

Re: Spilling hashed SetOps and aggregates to disk

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Spilling hashed SetOps and aggregates to disk
Date:	2018-06-11 17:50:31
Message-ID:	1528739431.8818.36.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2018-06-11 at 11:55 -0400, Robert Haas wrote:
> performance degradation is not necessarily much better than OOM. I
> suspect that part of the reason why both Andres and David proposed
> algorithms that combined hashing and sorting is out of a desire to
> cater somewhat to both few-tuples-per-group and many-tuples-per-
> group.
>

I think average group size is the wrong way to look at it, because
averages are only useful for a normal distribution. The two most
interesting cases are:

1. Fairly uniform group size (e.g. normal dist)
2. Skew values, power law distribution, etc., where some groups are
very large and most are tiny by comparison. I am calling the very large
groups "skewed groups".

For #1, hashing and sorting are both reasonable, and it depends on a
lot of factors (number of groups, clustering, available memory).

For #2, hashing is really good and sorting is really bad. That's
because (at present) we sort all of the tuples before doing any
aggregation, so it expends a lot of effort on the skewed groups.
Hashing can keep skewed groups in memory and avoid spilling a large
fraction of the input tuples at all.

If we get the skewed groups into the hash table, and avoid OOM, I think
that is a success. My patch did that, except it didn't account for two
cases:

(a) clustered input, where skewed groups may not appear until the
hash table is already full; and
(b) variable-sized transition values that grow with the group size

> One big advantage to just partitioning the input tuples by hash code
> and then proceeding batch by batch is that it works for any aggregate
> that can support hash aggregation in the first place. You do not

Agreed, but I think we should evaluate Andres's idea of feeding spilled
tuples to a Sort, because the overall complexity might be lower even
accounting for the special cases you're worried about.

Also, I am not sure we should really be designing around data types
where it makes sense to group and then don't supply a btree opclass.
Seems like they are likely to hit a problem soon anyway.

Regards,
Jeff Davis

In response to

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-11 15:55:02 from Robert Haas

Responses

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-11 17:59:35 from Tom Lane
Re: Spilling hashed SetOps and aggregates to disk at 2018-06-13 14:08:47 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2018-06-11 17:59:35	Re: Spilling hashed SetOps and aggregates to disk
Previous Message	Andres Freund	2018-06-11 17:34:55	Re: commitfest 2018-07