Re: Default setting for enable_hashagg_disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-25 20:36:29
Message-ID: 20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Hi,

On 2020-06-25 14:25:12 -0400, Bruce Momjian wrote:
> I am still trying to get my head around why the spill is going to be so
> much work to adjust for hash agg than our other spillable nodes.

Aggregates are the classical case used to process large amounts of
data. For larger amounts of data sorted input (be it via explicit sort
or ordered index scan) isn't an attractive option. IOW hash-agg is the
common case. There's also fewer stats for halfway accurately estimating
the number of groups and the size of the transition state - a sort /
hash join doesn't have an equivalent to the variably sized transition
value.

> What are people doing for those cases already? Do we have an
> real-world queries that are a problem in PG 13 for this?

I don't know about real world, but it's pretty easy to come up with
examples.

query:
SELECT a, array_agg(b) FROM (SELECT generate_series(1, 10000)) a(a), (SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING array_length(array_agg(b), 1) = 0;

work_mem = 4MB

12 18470.012 ms
HEAD 44635.210 ms

HEAD causes ~2.8GB of file IO, 12 doesn't cause any. If you're IO
bandwidth constrained, this could be quite bad.

Obviously this is contrived, and a pretty extreme case. But if you
imagine this happening on a system where disk IO isn't super fast
(e.g. just about any cloud provider).

An even more extreme version of the above is this:

query: SELECT a, array_agg(b) FROM (SELECT generate_series(1, 50000)) a(a), (SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING array_length(array_agg(b), 1) = 0;

work_mem = 16MB
12 81598.965 ms
HEAD 210772.360 ms

temporary tablespace on magnetic disk (raid 0 of two 7.2k server
spinning disks)

12 81136.530 ms
HEAD 225182.560 ms

The disks are busy in some periods, but still keep up. If I however make
the transition state a bit bigger:

query: SELECT a, array_agg(b), count(c), max(d),max(e) FROM (SELECT generate_series(1, 10000)) a(a), (SELECT generate_series(1, 5000)::text, repeat(random()::text, 10), repeat(random()::text, 10), repeat(random()::text, 10)) b(b,c,d,e) GROUP BY a HAVING array_length(array_agg(b), 1) = 0;

12 28164.865 ms

fast ssd:
HEAD 92520.680 ms

magnetic:
HEAD 183968.538 ms

(no reads, there's plenty enough memory. Just writes because the age /
amount thresholds for dirty data are reached)

In the magnetic case we're IO bottlenecked nearly the whole time.

Just to be clear: I think this is a completely over-the-top example. But
I do think it shows the problem to some degree at least.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Jeff Davis 2020-06-25 21:28:02 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-25 20:02:12 Re: Please add a link to [BRIN] physical block ranges of a table

Browse pgsql-hackers by date

  From Date Subject
Next Message Alastair McKinley 2020-06-25 20:47:30 Re: CUBE_MAX_DIM
Previous Message Tom Lane 2020-06-25 19:57:00 Re: should libpq also require TLSv1.2 by default?