Re: Print correct startup cost for the group aggregate.

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Print correct startup cost for the group aggregate.
Date: 2017-03-03 04:26:56
Message-ID: CAFjFpRe9K1WOnYzbC_AjsJP2aKd0O=FU4eMEtVgj2MrGjKGYfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>> Hi,
>>
>> While reading through the cost_agg() I found that startup cost for the
>> group aggregate is not correctly assigned. Due to this explain plan is
>> not printing the correct startup cost.
>>
>> Without patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------
>> GroupAggregate (cost=81634.33..85102.04 rows=198155 width=12)
>> Group Key: aid
>> -> Sort (cost=81634.33..82129.72 rows=198155 width=8)
>> Sort Key: aid
>> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155
>> width=8)
>> Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>> With patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------
>> GroupAggregate (cost=82129.72..85102.04 rows=198155 width=12)
>> Group Key: aid
>> -> Sort (cost=81634.33..82129.72 rows=198155 width=8)
>> Sort Key: aid
>> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155
>> width=8)
>> Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>
> The reason the reason why startup_cost = input_startup_cost and not
> input_total_cost for aggregation by sorting is we don't need the whole
> input before the Group/Agg plan can produce the first row. But I think
> setting startup_cost = input_startup_cost is also not exactly correct.
> Before the plan can produce one row, it has to transit through all the
> rows belonging to the group to which the first row belongs. On an
> average it has to scan (total number of rows)/(number of groups)
> before producing the first aggregated row. startup_cost will be
> input_startup_cost + cost to scan (total number of rows)/(number of
> groups) rows + cost of transiting over those many rows. Total cost =
> startup_cost + cost of scanning and transiting through the remaining
> number of input rows.

The comment below from cost_agg() may be the reason why we don't
bother including the cost of aggregating first group in startup cost.
* Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly the
* same total CPU cost, but AGG_SORTED has lower startup cost. If the
* input path is already sorted appropriately, AGG_SORTED should be
* preferred (since it has no risk of memory overflow). This will happen
* as long as the computed total costs are indeed exactly equal --- but if
* there's roundoff error we might do the wrong thing. So be sure that
* the computations below form the same intermediate values in the same
* order.

Particularly the last line implies that if we compute intermediate
values differently for the hash and sort aggregates, we have a risk of
floating point rounding errors. Including the cost of computing first
group's in the startup cost does exactly that. Probably having total
cost consistent is more important than having correct startup cost as
long as it's lesser than hash aggregate when the input is sorted per
grouping key.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-03 05:43:25 Re: SCRAM authentication, take three
Previous Message Amit Kapila 2017-03-03 04:14:11 Re: WAL Consistency checking for hash indexes