Quick Links

Re: Spilling hashed SetOps and aggregates to disk

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Spilling hashed SetOps and aggregates to disk
Date:	2018-06-05 05:46:47
Message-ID:	1528177607.2742.32.camel@j-davis.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 2018-06-05 at 07:04 +0200, Tomas Vondra wrote:
> I expect the eviction strategy to be the primary design challenge of
> this patch. The other bits will be mostly determined by this one
> piece.

Not sure I agree that this is the primary challenge.

The cases that benefit from eviction are probably a minority. I see two
categories that would benefit:

* Natural clustering in the heap. This sounds fairly common, but a
lot of the cases that come to mind are too low-cardinality to be
compelling; e.g. timestamps grouped by hour/day/month. If someone has
run into a high-cardinality natural grouping case, let me know.
* ARRAY_AGG (or similar): individual state values can be large enough
that we need to evict to avoid exceeding work_mem even if not adding
any new groups.

In either case, it seems like a fairly simple eviction strategy would
work. For instance, we could just evict the entire hash table if
work_mem is exceeded or if the hit rate on the hash table falls below a
certain threshold. If there was really something important that should
have stayed in the hash table, it will go back in soon anyway.

So why should eviction be a major driver for the entire design? I agree
it should be an area of improvement for the future, so let me know if
you see a major problem, but I haven't been as focused on eviction.

> While the primary goal of the patch is addressing the OOM risks in
> hash
> aggregate, I think it'd be a mistake to see it just that way. I see
> it
> could allow us to perform hash aggregate more often, even if we know
> the
> groups won't fit into work_mem. If we could estimate how much of the
> aggregate state we'll have to spill to disk, we could still prefer
> hashagg over groupagg. We would pay the price for eviction, but on
> large
> data sets that can be massively cheaper than having to do sort.

Agreed. I ran some tests of my patch in the last round, and they
strongly supported choosing HashAgg a lot more often. A lot of sort
improvements have been made though, so I really need to run some new
numbers.

> far away), but it would be unfortunate to make this improvement
> impossible/more difficult in the future.

If you see anything that would make this difficult in the future, let
me know.

Regards,
Jeff Davis

In response to

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-05 05:04:04 from Tomas Vondra

Responses

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-05 08:05:35 from Tomas Vondra
Re: Spilling hashed SetOps and aggregates to disk at 2018-06-11 14:25:45 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2018-06-05 06:04:21	Re: libpq compression
Previous Message	Thomas Munro	2018-06-05 05:26:08	Re: libpq compression