Re: Memory-Bounded Hash Aggregation

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Memory-Bounded Hash Aggregation
Date: 2019-07-12 01:06:33
Message-ID: 9be86fc1adc315f69b8af4b379087ab451008d8a.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2019-07-11 at 17:55 +0200, Tomas Vondra wrote:
> Makes sense. I haven't thought about how the hybrid approach would be
> implemented very much, so I can't quite judge how complicated would
> it be
> to extend "approach 1" later. But if you think it's a sensible first
> step,
> I trust you. And I certainly agree we need something to compare the
> other
> approaches against.

Is this a duplicate of your previous email?

I'm slightly confused but I will use the opportunity to put out another
WIP patch. The patch could use a few rounds of cleanup and quality
work, but the funcionality is there and the performance seems
reasonable.

I rebased on master and fixed a few bugs, and most importantly, added
tests.

It seems to be working with grouping sets fine. It will take a little
longer to get good performance numbers, but even for group size of one,
I'm seeing HashAgg get close to Sort+Group in some cases.

You are right that the missed lookups appear to be costly, at least
when the data all fits in system memory. I think it's the cache misses,
because sometimes reducing work_mem improves performance. I'll try
tuning the number of buckets for the hash table and see if that helps.
If not, then the performance still seems pretty good to me.

Of course, HashAgg can beat sort for larger group sizes, but I'll try
to gather some more data on the cross-over point.

Regards,
Jeff Davis

Attachment Content-Type Size
hashagg-20190711.patch text/x-patch 75.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-12 01:20:07 Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message Bruce Momjian 2019-07-12 01:05:17 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)