Re: AGG_HASHED cost estimate

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AGG_HASHED cost estimate
Date: 2017-04-20 10:46:18
Message-ID: CAFjFpReY-RbPsRyH7s8ApG6=Cv1KrGvK4j_FMFXJg4dbGu1wXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 20, 2017 at 7:47 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> In cost_size.c, there is this comment block:
>
> + * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly
> the
> + * same total CPU cost, but AGG_SORTED has lower startup cost. If
> the
> + * input path is already sorted appropriately, AGG_SORTED should be
> + * preferred (since it has no risk of memory overflow). This will
> happen
> + * as long as the computed total costs are indeed exactly equal ---
> but
> + * if there's roundoff error we might do the wrong thing. So be
> sure
> + * that the computations below form the same intermediate values in
> the
> + * same order.
>
> But, why should they have the same cost in the first place? A sorted
> aggregation just has to do an equality comparison on each adjacent pair,
> which is costed as (cpu_operator_cost * numGroupCols) * input_tuples. A
> hash aggregation has to do a hashcode computation for each input, which
> apparently is also costed at (cpu_operator_cost * numGroupCols) *
> input_tuples.

I suspect we don't cost this.

> But it also has to do the equality comparison between the
> input tuple and any aggregation it is about to be aggregated into, to make
> sure the hashcode is not a false collision. That should be another
> (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups), shouldn't
> it?

but cost this without numGroups.

/*
* The transCost.per_tuple component of aggcosts should be charged once
* per input tuple, corresponding to the costs of evaluating the aggregate
* transfns and their input expressions (with any startup cost of course
* charged but once). The finalCost component is charged once per output
* tuple, corresponding to the costs of evaluating the finalfns.
*
* If we are grouping, we charge an additional cpu_operator_cost per
* grouping column per input tuple for grouping comparisons.
*

The reason may be that hashing isn't as costly as a comparison. I
don't how true is that.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-04-20 11:27:03 Re: OK, so culicidae is *still* broken
Previous Message Suraj Kharage 2017-04-20 10:38:18 statement_timeout is not working as expected with postgres_fdw