Re: Spilling hashed SetOps and aggregates to disk

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-05 05:46:47
Message-ID: 1528177607.2742.32.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2018-06-05 at 07:04 +0200, Tomas Vondra wrote:
> I expect the eviction strategy to be the primary design challenge of
> this patch. The other bits will be mostly determined by this one
> piece.

Not sure I agree that this is the primary challenge.

The cases that benefit from eviction are probably a minority. I see two
categories that would benefit:

  * Natural clustering in the heap. This sounds fairly common, but a
lot of the cases that come to mind are too low-cardinality to be
compelling; e.g. timestamps grouped by hour/day/month. If someone has
run into a high-cardinality natural grouping case, let me know.
  * ARRAY_AGG (or similar): individual state values can be large enough
that we need to evict to avoid exceeding work_mem even if not adding
any new groups.

In either case, it seems like a fairly simple eviction strategy would
work. For instance, we could just evict the entire hash table if
work_mem is exceeded or if the hit rate on the hash table falls below a
certain threshold. If there was really something important that should
have stayed in the hash table, it will go back in soon anyway.

So why should eviction be a major driver for the entire design? I agree
it should be an area of improvement for the future, so let me know if
you see a major problem, but I haven't been as focused on eviction.

> While the primary goal of the patch is addressing the OOM risks in
> hash
> aggregate, I think it'd be a mistake to see it just that way. I see
> it
> could allow us to perform hash aggregate more often, even if we know
> the
> groups won't fit into work_mem. If we could estimate how much of the
> aggregate state we'll have to spill to disk, we could still prefer
> hashagg over groupagg. We would pay the price for eviction, but on
> large
> data sets that can be massively cheaper than having to do sort.

Agreed. I ran some tests of my patch in the last round, and they
strongly supported choosing HashAgg a lot more often. A lot of sort
improvements have been made though, so I really need to run some new
numbers.

> far away), but it would be unfortunate to make this improvement
> impossible/more difficult in the future.

If you see anything that would make this difficult in the future, let
me know.

Regards,
     Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-06-05 06:04:21 Re: libpq compression
Previous Message Thomas Munro 2018-06-05 05:26:08 Re: libpq compression