Re: Print correct startup cost for the group aggregate.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Print correct startup cost for the group aggregate.
Date: 2017-03-06 09:25:03
Message-ID: CAFjFpRfvJEsStm3trC7h_GGDPC-YMqaXMjGUHo5e4QMMtYRr5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
> I understood you reasoning of why startup_cost = input_startup_cost and not
> input_total_cost for aggregation by sorting. But what I didn't understand is
> how come higher startup cost for aggregation by sorting would force hash
> aggregation to be chosen? I am not clear about this part.

See this comment in cost_agg()
* Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly the
* same total CPU cost, but AGG_SORTED has lower startup cost. If the
* input path is already sorted appropriately, AGG_SORTED should be
* preferred (since it has no risk of memory overflow).

AFAIU, if the input is already sorted, aggregation by sorting and
aggregation by hashing will have almost same cost, the startup cost of
AGG_SORTED being lower than AGG_HASHED. Because of lower startup cost,
AGG_SORTED gets chosen by add_path() over AGG_HASHED path.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-03-06 09:41:02 Re: UPDATE of partition key
Previous Message Kyotaro HORIGUCHI 2017-03-06 09:20:06 Re: Restricting maximum keep segments by repslots