Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-04-12 05:05:14
Message-ID: 8f06a452-55f7-4b72-bb9f-c1f3df44b94b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/24 06:44, Tom Lane wrote:
> If this patch were producing better results I'd be more excited
> about putting more work into it. But on the basis of what I'm
> seeing right now, I think maybe we ought to give up on it.
First, thanks for the deep review - sometimes, only a commit gives us a
chance to get such observation :))).
On a broader note, introducing automatic group-by-order choosing is a
step towards training the optimiser to handle poorly tuned incoming
queries. While it's true that this may initially impact performance,
it's crucial to weigh the potential benefits. So, beforehand, we should
agree: Is it worth it?
If yes, I would say I see how often hashing doesn't work in grouping.
Sometimes because of estimation errors, sometimes because grouping
already has sorted input, sometimes in analytical queries when planner
doesn't have enough memory for hashing. In analytical cases, the only
way to speed up queries sometimes is to be smart with features like
IncrementalSort and this one.
About low efficiency. Remember the previous version of the GROUP-BY
optimisation - we disagreed on operator costs and the cost model in
general. In the current version, we went the opposite - adding small
features step-by-step. The current commit contains an integral part of
the feature and is designed for safely testing the approach and adding
more profitable parts like choosing group-by-order according to distinct
values or unique indexes on grouping columns.
I have passed through the code being steered by the issues explained in
detail. I see seven issues. Two of them definitely should be scrutinised
right now, and I'm ready to do that.

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-04-12 06:02:41 Re: Issue with the PRNG used by Postgres
Previous Message Alexander Lakhin 2024-04-12 05:05:05 Re: Issue with the PRNG used by Postgres