Quick Links

Re: Spilling hashed SetOps and aggregates to disk

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	David Gershuni <dgershun(at)cs(dot)cmu(dot)edu>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Spilling hashed SetOps and aggregates to disk
Date:	2018-06-21 23:57:05
Message-ID:	1529625425.3820.81.camel@j-davis.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2018-06-21 at 13:44 -0700, David Gershuni wrote:
> To handle hash collisions, we can do the following:
>
> 1) We track the current hash code we’re processing, in ascending
> order.
>
> 2) Instead of combining one group at at time, we’ll maintain a list
> of
> all groups we’ve seen that match the current hash code.
>
> 3) When we peak at a spill file, if its next group’s hash code
> matches
> our current hash code, we’ll check to see if that group matches any
> of the
> groups in our list. If so, we’ll combine it with the matching group.
> If not,
> we’ll add this group to our list.
>
> 4) Once the head of each spill file exceeds the current hash code,
> we’ll
> emit our list as output, empty it, and advance to the next hash code.
>
> Does that seem reasonable?

It seems algorithmically reasonable but awfully complex. I don't think
that's a good path to take.

I still only see two viable approaches:

(1) Spill tuples into hash partitions: works and is a reasonably simple
extension of our existing code. This is basically what the last patch I
posted does (posted before grouping sets, so needs to be updated).

(2) Spill tuples into a sort: I like this approach because it can be
done simply (potentially less code than #1), and could be further
improved without adding a ton of complexity. It may even eliminate the
need for the planner to choose between hashagg and sort. The problem
here is data types that have a hash opclass but no btree opclass. I
might try to sidestep this problem by saying that data types with no
btree opclass will not obey work_mem.

Additionally, there are two other ideas that we might want to do
regardless of which approach we take:

* support evicting transition states from the hash table, so that we
can handle clustered input better

* refactor grouping sets so that phases are separate executor nodes so
that we can undo some of the complexity in nodeAgg.c

Regards,
Jeff Davis

In response to

Re: Spilling hashed SetOps and aggregates to disk at 2018-06-21 20:44:56 from David Gershuni

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Craig Ringer	2018-06-22 00:24:45	Re: PATCH: backtraces for error messages
Previous Message	Bruce Momjian	2018-06-21 23:46:35	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)