From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: AGG_HASHED cost estimate |
Date: | 2017-04-20 10:46:18 |
Message-ID: | CAFjFpReY-RbPsRyH7s8ApG6=Cv1KrGvK4j_FMFXJg4dbGu1wXQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 20, 2017 at 7:47 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> In cost_size.c, there is this comment block:
>
> + * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly
> the
> + * same total CPU cost, but AGG_SORTED has lower startup cost. If
> the
> + * input path is already sorted appropriately, AGG_SORTED should be
> + * preferred (since it has no risk of memory overflow). This will
> happen
> + * as long as the computed total costs are indeed exactly equal ---
> but
> + * if there's roundoff error we might do the wrong thing. So be
> sure
> + * that the computations below form the same intermediate values in
> the
> + * same order.
>
> But, why should they have the same cost in the first place? A sorted
> aggregation just has to do an equality comparison on each adjacent pair,
> which is costed as (cpu_operator_cost * numGroupCols) * input_tuples. A
> hash aggregation has to do a hashcode computation for each input, which
> apparently is also costed at (cpu_operator_cost * numGroupCols) *
> input_tuples.
I suspect we don't cost this.
> But it also has to do the equality comparison between the
> input tuple and any aggregation it is about to be aggregated into, to make
> sure the hashcode is not a false collision. That should be another
> (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups), shouldn't
> it?
but cost this without numGroups.
/*
* The transCost.per_tuple component of aggcosts should be charged once
* per input tuple, corresponding to the costs of evaluating the aggregate
* transfns and their input expressions (with any startup cost of course
* charged but once). The finalCost component is charged once per output
* tuple, corresponding to the costs of evaluating the finalfns.
*
* If we are grouping, we charge an additional cpu_operator_cost per
* grouping column per input tuple for grouping comparisons.
*
The reason may be that hashing isn't as costly as a comparison. I
don't how true is that.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2017-04-20 11:27:03 | Re: OK, so culicidae is *still* broken |
Previous Message | Suraj Kharage | 2017-04-20 10:38:18 | statement_timeout is not working as expected with postgres_fdw |