From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Default setting for enable_hashagg_disk |
Date: | 2020-04-09 11:48:55 |
Message-ID: | 20200409114855.h4hh7e2wt77p56sv@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote:
>On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote:
>> The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on
>> costing. If false, it only generates a HashAgg path if it thinks it will fit
>> in work_mem, similar to the old behavior (though it wlil now spill to disk if
>> the planner was wrong about it fitting in work_mem). The current default is
>> true.
>
>Are there any other GUCs that behave like that ? It's confusing to me when I
>see "Disk Usage: ... kB", despite setting it to "disable", and without the
>usual disable_cost. I realize that postgres chose the plan on the hypothesis
>that it would *not* exceed work_mem, and that spilling to disk is considered
>preferable to ignoring the setting, and that "going back" to planning phase
>isn't a possibility.
>
It it really any different from our enable_* GUCs? Even if you do e.g.
enable_sort=off, we may still do a sort. Same for enable_groupagg etc.
>template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
> HashAggregate (actual time=1370.945..2877.250 rows=999999 loops=1)
> Group Key: a
> Peak Memory Usage: 5017 kB
> Disk Usage: 22992 kB
> HashAgg Batches: 84
> -> Function Scan on generate_series a (actual time=314.507..741.517 rows=999999 loops=1)
>
>A previous version of the docs said this, which I thought was confusing, and you removed it.
>But I guess this is the behavior it was trying to .. explain.
>
>+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
>+ ... This only affects the planner choice;
>+ execution time may still require using disk-based hash
>+ aggregation. The default is <literal>on</literal>.
>
>I suggest that should be reworded and then re-introduced, unless there's some
>further behavior change allowing the previous behavior of
>might-exceed-work-mem.
>
Yeah, it would be good to mention this is a best-effort setting.
>"This setting determines whether the planner will elect to use a hash plan
>which it expects will exceed work_mem and spill to disk. During execution,
>hash nodes which exceed work_mem will spill to disk even if this setting is
>disabled. To avoid spilling to disk, either increase work_mem (or set
>enable_hashagg=off)."
>
>For sure the release notes should recommend re-calibrating work_mem.
>
I don't follow. Why would the recalibrating be needed?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-04-09 17:24:04 | Re: Default setting for enable_hashagg_disk |
Previous Message | PG Doc comments form | 2020-04-08 21:50:44 | Please provide examples of rows from |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2020-04-09 11:49:06 | Re: Parallel copy |
Previous Message | tushar | 2020-04-09 11:46:53 | Re: [Proposal] Global temporary tables |