Re: hashagg slowdown due to spill changes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: hashagg slowdown due to spill changes
Date: 2020-06-15 13:34:03
Message-ID: 20200615133403.gt6x3musvjzvw24y@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 14, 2020 at 11:09:55PM -0700, Jeff Davis wrote:
>On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote:
>> I'm somewhat inclined to think that we should revert 4cad2534da6 and
>> then look at how precisely to tackle this in 14.
>
>I'm fine with that.
>

I don't see how we could just revert 4cad2534d and leave this for v14.

The hashagg spilling is IMHO almost guaranteed to be a pain point for
some users, as it will force some queries to serialize large amounts of
data. Yes, some of this is a cost for hashagg enforcing work_mem at
runtime, I'm fine with that. We'd get reports about that too, but we can
justify that cost ...

But just reverting 4cad2534d will make this much worse, I think, as
illustrated by the benchmarks I did in [1]. And no, this is not really
fixable by tweaking the cost parameters - even with the current code
(i.e. 4cad2534d in place) I had to increase random_page_cost to 60 on
the temp tablespace (on SATA RAID) to get good plans with parallelism
enabled. I haven't tried, but I presume without 4cad2534d I'd have to
push r_p_c even further ...

[1] https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development

>> It'd probably make sense to request small tlists when the number of
>> estimated groups is large, and not otherwise.
>
>That seems like a nice compromise that would be non-invasive, at least
>for create_agg_plan().
>

Maybe. It'd certainly better than nothing. It's not clear to me what
would a good threshold be, though. And it's not going to handle cases of
under-estimates.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-06-15 13:35:49 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message 李杰 (慎追) 2020-06-15 13:33:05 回复:回复:how to create index concurrently on partitioned table