Re: Default setting for enable_hashagg_disk

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>,Jeff Davis <pgsql(at)j-davis(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,David Rowley <dgrowleyml(at)gmail(dot)com>,Justin Pryzby <pryzby(at)telsasoft(dot)com>,Melanie Plageman <melanieplageman(at)gmail(dot)com>,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-25 22:48:24
Message-ID: 85F9F4F0-918A-457F-BC9E-F2AC2EFC1F86@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Hi,

On June 25, 2020 3:44:22 PM PDT, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>On 2020-Jun-25, Andres Freund wrote:
>
>> > What are people doing for those cases already? Do we have an
>> > real-world queries that are a problem in PG 13 for this?
>>
>> I don't know about real world, but it's pretty easy to come up with
>> examples.
>>
>> query:
>> SELECT a, array_agg(b) FROM (SELECT generate_series(1, 10000)) a(a),
>(SELECT generate_series(1, 10000)) b(b) GROUP BY a HAVING
>array_length(array_agg(b), 1) = 0;
>>
>> work_mem = 4MB
>>
>> 12 18470.012 ms
>> HEAD 44635.210 ms
>>
>> HEAD causes ~2.8GB of file IO, 12 doesn't cause any. If you're IO
>> bandwidth constrained, this could be quite bad.
>
>... however, you can pretty much get the previous performance back by
>increasing work_mem. I just tried your example here, and I get 32
>seconds of runtime for work_mem 4MB, and 13.5 seconds for work_mem 1GB
>(this one spills about 800 MB); if I increase that again to 1.7GB I get
>no spilling and 9 seconds of runtime. (For comparison, 12 takes 15.7
>seconds regardless of work_mem).
>
>My point here is that maybe we don't need to offer a GUC to explicitly
>turn spilling off; it seems sufficient to let users change work_mem so
>that spilling will naturally not occur. Why do we need more?

That's not really a useful escape hatch, because I'll often lead to other nodes using more memory.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Alvaro Herrera 2020-06-25 22:58:53 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-25 22:44:22 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-06-25 22:53:50 Re: min_safe_lsn column in pg_replication_slots view
Previous Message Alvaro Herrera 2020-06-25 22:44:22 Re: Default setting for enable_hashagg_disk