From: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|

To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |

Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |

Subject: | Re: POC: GROUP BY optimization |

Date: | 2022-02-01 09:48:11 |

Message-ID: | 82114b8f-d3a1-ee58-f840-466e695f90c0@postgrespro.ru |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

> On 7/22/21 3:58 AM, Tomas Vondra wrote:

> I've simplified the costing a bit, and the attached version actually

> undoes all the "suspicious" plan changes in postgres_fdw. It changes one

> new plan, but that seems somewhat reasonable, as it pushes sort to the

> remote side.

I tried to justify heap-sort part of the compute_cpu_sort_cost() routine

and realized, that here we may have a mistake.

After a week of efforts, I don't found any research papers on dependence

of bounded heap-sort time compexity on number of duplicates.

So, I suppose self-made formula, based on simple logical constructions:

1. We should base on initial formula: cost ~ N*LOG2(M), where M -

output_tuples.

2. Realize, that full representation of this formula is:

cost ~ N*LOG2(min{N,M})

3. In the case of multicolumn, number of comparisons for each next

column can be estimated by the same formula, but arranged to a number of

tuples per group:

comparisons ~ input * LOG2(min{input,M})

4. Realize, that for the case, when M > input, we should change this

formula a bit:

comparisons ~ max{input,M} * LOG2(min{input,M})

Remember, that in our case M << tuples.

So, general formula for bounded heap sort can be written as:

cost ~ N * sum(max{n_i,M}/N * LOG2(min{n_i,M})), i=1,ncols

where n_1 == N, n_i - number of tuples per group, estimated from

previous iteration.

In attachment - an implementation of this approach.

--

regards,

Andrey Lepikhov

Postgres Professional

Attachment | Content-Type | Size |
---|---|---|

bounded_heap_sort_fix.txt | text/plain | 3.1 KB |

- Re: POC: GROUP BY optimization at 2022-01-20 05:05:57 from Andrey V. Lepikhov

From | Date | Subject | |
---|---|---|---|

Next Message | Daniel Gustafsson | 2022-02-01 10:09:08 | Re: Support for NSS as a libpq TLS backend |

Previous Message | Pavel Borisov | 2022-02-01 08:38:01 | Re: Make mesage at end-of-recovery less scary. |