Quick Links

Re: cost_sort() improvements

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: cost_sort() improvements
Date:	2018-07-12 14:42:29
Message-ID:	ce8eff53-52f2-e7e6-0059-8527c3f2892d@sigaev.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> OK, so Fi is pretty much whatever CREATE FUNCTION ... COST says, right?
exactly

> Hmm, makes sense. But doesn't that mean it's mostly a fixed per-tuple
> cost, not directly related to the comparison? For example, why should it
> be multiplied by C0? That is, if I create a very expensive comparator
> (say, with cost 100), why should it increase the cost for transferring
> the tuple to CPU cache, unpacking it, etc.?
>
> I'd say those costs are rather independent of the function cost, and
> remain rather fixed, no matter what the function cost is.
>
> Perhaps you haven't noticed that, because the default funcCost is 1?
May be, but see my email
https://www.postgresql.org/message-id/ee14392b-d753-10ce-f5ed-7b2f7e277512%40sigaev.ru
about additional term proportional to N

> The number of new magic constants introduced by this patch is somewhat
> annoying. 2.0, 1.5, 0.125, ... :-(
2.0 is removed in last patch, 1.5 leaved and could be removed when I understand
you letter with group size estimation :)
0.125 should be checked, and I suppose we couldn't remove it at all because it
"average over whole word" constant.

>
>> - Final cost is cpu_operator_cost * N * sum(per column costs described
>> above).
>>    Note, for single column with width <= sizeof(datum) and F1 = 1 this
>> formula
>>    gives exactly the same result as current one.
>> - for Top-N sort empiric is close to old one: use 2.0 multiplier as
>> constant
>>    under log2, and use log2(Min(NGi, output_tuples)) for second and
>> following
>>    columns.
>>
>
> I think compute_cpu_sort_cost is somewhat confused whether
> per_tuple_cost is directly a cost, or a coefficient that will be
> multiplied with cpu_operator_cost to get the actual cost.
>
> At the beginning it does this:
>
> per_tuple_cost = comparison_cost;
>
> so it inherits the value passed to cost_sort(), which is supposed to be
> cost. But then it does the work, which includes things like this:
>
> per_tuple_cost += 2.0 * funcCost * LOG2(tuples);
>
> where funcCost is pretty much pg_proc.procost. AFAIK that's meant to be
> a value in units of cpu_operator_cost. And at the end it does this
>
> per_tuple_cost *= cpu_operator_cost;
>
> I.e. it gets multiplied with another cost. That doesn't seem right.

Huh, you are right, will fix in v8.

> Also, why do we need this?
>
> if (sortop != InvalidOid)
> {
> Oid funcOid = get_opcode(sortop);
>
> funcCost = get_func_cost(funcOid);
> }
Safety first :). Will remove.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Re: cost_sort() improvements at 2018-07-08 23:22:15 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Teodor Sigaev	2018-07-12 14:48:21	Re: cost_sort() improvements
Previous Message	Tom Lane	2018-07-12 14:38:15	Re: _isnan() on Windows