Quick Links

Re: Memory-Bounded Hash Aggregation

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Memory-Bounded Hash Aggregation
Date:	2020-02-13 05:51:12
Message-ID:	3bed775951a72dbdf8446c6750936e38dcf107b9.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2020-02-10 at 15:57 -0800, Jeff Davis wrote:
> Attaching latest version (combined logtape changes along with main
> HashAgg patch).

I ran a matrix of small performance tests to look for regressions.

The goal was to find out if the refactoring or additional branches
introduced by this patch caused regressions in in-memory HashAgg, Sort,
or the JIT paths. Fortunately, I didn't find any.

This is *not* supposed to represent the performance benefits of the
patch, only to see if I regressed somewhere else. The performance
benefits will be shown in the next round of tests.

I tried with JIT on/off, work_mem='4MB' and also a value high enough to
fit the entire working set, enable_hashagg on/off, and 4 different
tables.

The 4 tables are (each containing 20 million tuples):

t1k_20k_int4:
1K groups of 20K tuples each (randomly generated and ordered)
t20m_1_int4:
20M groups of 1 tuple each (randomly generated and ordered)
t1k_20k_text:
the same as t1k_20k_int4 but cast to text (collation C.UTF-8)
t20m_1_text:
the same as t20m_1_int4 but cast to text (collation C.UTF-8)

The query is:

select count(*) from (select i, count(*) from $TABLE group by i) s;

I just did 3 runs in psql and took the median result.

I ran against master (cac8ce4a, slightly older, before any of my
patches went in) and my dev branch (attached patch applied against
0973f560).

Results were pretty boring, in a good way. All results within the
noise, and about as many results were better on dev than master as
there were better on master than dev.

I also did some JIT-specific tests against only t1k_20k_int4. For that,
the hash table fits in memory anyway, so I didn't vary work_mem. The
query I ran included more aggregates to better test JIT:

select i, sum(i), avg(i), min(i)
from t1k_20k_int4
group by i
offset 1000000; -- offset so it doesn't return result

I know these tests are simplistic, but I also think they represent a
lot of areas where regressions could have potentially been introduced.
If someone else can find a regression, please let me know.

The new patch is basically just rebased -- a few other very minor
changes.

Regards,
Jeff Davis

Attachment	Content-Type	Size
hashagg-20200212-1.patch	text/x-patch	109.1 KB

In response to

Re: Memory-Bounded Hash Aggregation at 2020-02-10 23:57:21 from Jeff Davis

Responses

Re: Memory-Bounded Hash Aggregation at 2020-02-13 22:48:20 from Jeff Davis
Re: Memory-Bounded Hash Aggregation at 2020-02-14 21:53:22 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2020-02-13 05:55:53	Re: error context for vacuum to include block number
Previous Message	Dilip Kumar	2020-02-13 04:52:33	Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager