Quick Links

AGG_HASHED cost estimate

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	AGG_HASHED cost estimate
Date:	2017-04-20 02:17:12
Message-ID:	CAMkU=1xKoOkh04=6p03+5vaSGaWYuJN-wnt0+eR2EC0cnnWedg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

In cost_size.c, there is this comment block:

+ * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly
the
+ * same total CPU cost, but AGG_SORTED has lower startup cost. If
the
+ * input path is already sorted appropriately, AGG_SORTED should be
+ * preferred (since it has no risk of memory overflow). This will
happen
+ * as long as the computed total costs are indeed exactly equal ---
but
+ * if there's roundoff error we might do the wrong thing. So be
sure
+ * that the computations below form the same intermediate values in
the
+ * same order.

But, why should they have the same cost in the first place? A sorted
aggregation just has to do an equality comparison on each adjacent pair,
which is costed as (cpu_operator_cost * numGroupCols) * input_tuples. A
hash aggregation has to do a hashcode computation for each input, which
apparently is also costed at (cpu_operator_cost * numGroupCols) *
input_tuples. But it also has to do the equality comparison between the
input tuple and any aggregation it is about to be aggregated into, to make
sure the hashcode is not a false collision. That should be
another (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups),
shouldn't it? Currently, that is apparently assumed to be free.

Is there something here I am overlooking?

Cheers,

Jeff

Responses

Re: AGG_HASHED cost estimate at 2017-04-20 10:46:18 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2017-04-20 03:00:05	Re: pg_dump emits ALTER TABLE ONLY partitioned_table
Previous Message	Michael Paquier	2017-04-20 00:44:10	Re: Continuous buildfarm failures on hamster with bin-check