From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|

To: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |

Subject: | cost_sort() improvements |

Date: | 2018-06-28 16:47:22 |

Message-ID: | 803ccdce-39ef-f1c3-3c65-c79959f7081d@sigaev.ru |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

Hi!

Current estimation of sort cost has following issues:

- it doesn't differ one and many columns sort

- it doesn't pay attention to comparison function cost and column width

- it doesn't try to count number of calls of comparison function on per column

basis

At least [1] and [2] hit into to that issues and have an objections/questions

about correctness of cost sort estimation. Suggested patch tries to improve

current estimation and solve that issues.

Basic ideas:

- let me introduce some designations

i - index of column, started with 1

N - number of tuples to sort

Gi - number of groups, calculated by i number columns. Obviously,

G0 == 1. Number of groups is caculated by estimate_num_groups().

NGi - number of tuples in one group. NG0 == N and NGi = N/G(i-1)

Fi - cost of comparison function call of i-th column

Wi - average width in bytes of 1-ith column.

Ci - total cost of one comparison call of column calculated as

Fi * fine_function(Wi) where fine_function(Wi) looks like:

if (Wi <= sizeof(Datum))

return 1.0; //any type passed in Datum

else

return 1 + log16(Wi/sizeof(Datum));

log16 just an empirical choice

- So, cost for first column is 2.0 * C0 * log2(N)

First column is always compared, multiplier 2.0 is defined per regression

test. Seems, it estimates a cost for transferring tuple to CPU cache,

starting cost for unpacking tuple, cost of call qsort compare wrapper

function, etc. Removing this multiplier causes too optimistic prediction of

cost.

- cost for i-th columns:

Ci * log2(NGi) => Ci * log2(N/G(i-1))

So, here I believe, that i-th comparison function will be called only inside

group which number is defined by (i-1) leading columns. Note, following

discussion [1] I add multiplier 1.5 here to be closer to worst case when

groups are significantly differ. Right now there is no way to determine how

differ they could be. Note, this formula describes cost of first column too

except difference with multiplier.

- Final cost is cpu_operator_cost * N * sum(per column costs described above).

Note, for single column with width <= sizeof(datum) and F1 = 1 this formula

gives exactly the same result as current one.

- for Top-N sort empiric is close to old one: use 2.0 multiplier as constant

under log2, and use log2(Min(NGi, output_tuples)) for second and following

columns.

I believe, suggested method could be easy enhanced to support [1] and used in [2].

Open items:

- Fake Var. I faced with generate_append_tlist() which generates Var with

varno = 0, it was invented, as I can see, at 2002 with comment 'should be

changed in future'. Future hasn't yet come... I've added workaround to

prevent call estimate_num_group() with such Vars, but I'm not sure that is

enough.

- Costs of all comparison functions now are the same and equal to 1. May be,

it's time to change that.

- Empiric constants. I know, it's impossible to remove them at all, but, may

be, we need to find better estimation of them.

[1] https://commitfest.postgresql.org/18/1124/

[2] https://commitfest.postgresql.org/18/1651/

--

Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru

WWW: http://www.sigaev.ru/

Attachment | Content-Type | Size |
---|---|---|

cost_sort-v6.patch | text/x-patch | 9.5 KB |

- Re: cost_sort() improvements at 2018-06-28 19:30:01 from Peter Geoghegan
- Re: cost_sort() improvements at 2018-07-08 23:22:15 from Tomas Vondra
- Re: cost_sort() improvements at 2018-07-10 20:56:48 from Tomas Vondra
- Re: cost_sort() improvements at 2018-07-12 14:31:58 from Teodor Sigaev

From | Date | Subject | |
---|---|---|---|

Next Message | Teodor Sigaev | 2018-06-28 16:52:03 | Re: POC: GROUP BY optimization |

Previous Message | Peter Geoghegan | 2018-06-28 16:46:17 | Re: Tips on committing |