From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Zhihong Yu <zyu(at)yugabyte(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2023-07-24 09:56:24 |
Message-ID: | 29bd1f4f-6b69-df43-1f3a-6ef1fc501657@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7/24/23 04:10, Andrey Lepikhov wrote:
> On 20/7/2023 18:46, Tomas Vondra wrote:
>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>> On 3/10/2022 21:56, Tom Lane wrote:
>>>> Revert "Optimize order of GROUP BY keys".
>>>>
>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>>> several follow-on fixes.
>>>> ...
>>>> Since we're hard up against the release deadline for v15, let's
>>>> revert these changes for now. We can always try again later.
>>>
>>> It may be time to restart the project. As a first step, I rebased the
>>> patch on the current master. It wasn't trivial because of some latest
>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>> Now, Let's repeat the review and rewrite the current path according to
>>> the reasons uttered in the revert commit.
>>
>> I think the fundamental task is to make the costing more reliable, and
>> the commit message 443df6e2db points out a couple challenges in this
>> area. Not sure how feasible it is to address enough of them ...
>>
>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>> some microbenchmarks and tuning the costs for the most expensive cases.
>>
>> 2) estimating quicksort comparisons - This relies on ndistinct
>> estimates, and I'm not sure how much more reliable we can make those.
>> Probably not much :-( Not sure what to do about this, the only thing I
>> can think of is to track "reliability" of the estimates and only do the
>> reordering if we have high confidence in the estimates. That means we'll
>> miss some optimization opportunities, but it should limit the risk.
> I read up on the history of this thread.
> As I see, all the problems mentioned above can be beaten by excluding
> the new cost model at all. We can sort GROUP BY columns according to the
> 'ndistinct' value.
> I see the reason for introducing the cost model in [1]. The main
> supporting point here is that with this patch, people couldn't optimize
> the query by themselves, organizing the order of the columns in a more
> optimal way. But now we have at least the GUC to switch off the
> behaviour introduced here. Also, some extensions, like the well-known
> pg_hint_plan, can help with automation.
I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.
> So, how about committing of the undoubted part of the feature and
> working on the cost model in a new thread?
>
But Tom's commit message says this:
Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.
so I'm not sure this really counts as "undoubted".
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2023-07-24 10:06:06 | Re: cataloguing NOT NULL constraints |
Previous Message | Pavel Luzanov | 2023-07-24 09:32:28 | Re: multiple membership grants and information_schema.applicable_roles |