From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: Disk-based hash aggregate's cost model |
Date: | 2020-08-29 01:32:38 |
Message-ID: | 888b956ca998e534e8c81a3d23da0879855b4d96.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2020-08-27 at 17:28 -0700, Peter Geoghegan wrote:
> We have a Postgres 13 open item for Disk-based hash aggregate, which
> is the only non-trivial open item. There is a general concern about
> how often we get disk-based hash aggregation when work_mem is
> particularly low, and recursion seems unavoidable. This is generally
> thought to be a problem in the costing.
We discussed two approaches to tweaking the cost model:
1. Penalize HashAgg disk costs by a constant amount. It seems to be
chosen a little too often, and we can reduce the number of plan
changes.
2. Treat recursive HashAgg spilling skeptically, and heavily penalize
recursive spilling.
The problem with approach #2 is that we have a default hash mem of 4MB,
and real systems have a lot more than that. In this scenario, recursive
spilling can beat Sort by a lot.
For instance:
Data:
create table text10m(t text collate "C.UTF-8", i int, n numeric);
insert into t10m
select s.g::text, s.g, s.g::numeric
from (
select (random()*1000000000)::int as g
from generate_series(1,10000000)) s;
vacuum (freeze,analyze) text10m;
Query: explain analyze select distinct t from text10m;
HashAgg: 10.5s
Sort+Distinct: 46s
I'm inclined toward option #1 for simplicity unless you feel strongly
about option #2. Specifically, I was thinking of a 1.5X penalty for
HashAgg disk costs.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-08-29 01:43:18 | Re: More aggressive vacuuming of temporary tables |
Previous Message | Andy Fan | 2020-08-28 22:38:36 | Re: Improve planner cost estimations for alternative subplans |