Re: Spilling hashed SetOps and aggregates to disk

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spilling hashed SetOps and aggregates to disk
Date: 2018-06-05 17:47:49
Message-ID: 1528220869.2742.58.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2018-06-05 at 05:42 -0700, Andres Freund wrote:
> That's an interesting idea, but it seems simpler to stick to
> > hashing
> > rather than using a combination strategy. It also seems like it
> > would
> > take less CPU effort.
> Isn't the locality of access going to considerably better with the
> sort
> based approach?

I think I see what you mean, but without measuring it's hard to say.
You may be able to achieve similar locality in the hashing approach by
using a smaller hash table -- an effect that I think I observed
previously but I'd have to rerun the numbers.

> >
> > What advantages do you have in mind? My patch partitions the
> > spilled
> > data, so it should have similar disk costs as a sort approach.
> I think one part of it is that I think the amount of code is going to
> be
> lower - we essentially have already all the code to handle sort based
> aggs, and to have both sort and hash based aggs in the same query.
> We'd
> mostly need a way to scan the hashtable and stuff it into a
> tuplesort,
> that's not hard.  nodeAgg.c is already more than complex enough, I'm
> not
> sure that full blown partitioning is worth the cost.

The thing I like about your idea is that we wouldn't need to make a
choice at plan time, we just always do the hybrid hashing/sorting
unless we know that it's already sorted.

The thing I don't like about it is that it requires running two memory-
hungry operations at once. How much of work_mem do we use for sorted
runs, and how much do we use for the hash table?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2018-06-05 17:51:14 Re: [PATCH] Trim trailing whitespace in vim and emacs
Previous Message Andres Freund 2018-06-05 17:46:12 Re: Variable-length FunctionCallInfoData