Re: planner chooses incremental but not the best one

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: planner chooses incremental but not the best one
Date: 2024-02-15 16:56:15
Message-ID: 575d0bca-d235-4bb1-8901-3866e5f3c250@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/15/24 13:45, Andrei Lepikhov wrote:
> On 15/2/2024 18:10, Tomas Vondra wrote:
>>
>>
>> On 2/15/24 07:50, Andrei Lepikhov wrote:
>>> On 18/12/2023 19:53, Tomas Vondra wrote:
>>>> On 12/18/23 11:40, Richard Guo wrote:
>>>> The challenge is where to get usable information about correlation
>>>> between columns. I only have a couple very rought ideas of what might
>>>> try. For example, if we have multi-column ndistinct statistics, we
>>>> might
>>>> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>>>>
>>>>       ndistinct(b,c,d) / ndistinct(b,c)
>>>>
>>>> If we know how many distinct values we have for the predicate
>>>> column, we
>>>> could then estimate the number of groups. I mean, we know that for the
>>>> restriction "WHERE b = 3" we only have 1 distinct value, so we could
>>>> estimate the number of groups as
>>>>
>>>>       1 * ndistinct(b,c)
>>> Did you mean here ndistinct(c,d) and the formula:
>>> ndistinct(b,c,d) / ndistinct(c,d) ?
>>
>> Yes, I think that's probably a more correct ... Essentially, the idea is
>> to estimate the change in number of distinct groups after adding a
>> column (or restricting it in some way).
> Thanks, I got it. I just think how to implement such techniques with
> extensions just to test the idea in action. In the case of GROUP-BY we
> can use path hook, of course. But what if to invent a hook on clauselist
> estimation?

Maybe.

I have thought about introducing such hook to alter estimation of
clauses, so I'm not opposed to it. Ofc, it depends on where would the
hook be, what would it be allowed to do etc. And as it doesn't exist
yet, it'd be more a "local" improvement to separate the changes into an
extension.

>>> Do you implicitly bear in mind here the necessity of tracking clauses
>>> that were applied to the data up to the moment of grouping?
>>>
>>
>> I don't recall what exactly I considered two months ago when writing the
>> message, but I don't see why we would need to track that beyond what we
>> already have. Shouldn't it be enough for the grouping to simply inspect
>> the conditions on the lower levels?
> Yes, exactly. I've thought about looking into baserestrictinfos and, if
> group-by references a subquery targetlist, into subqueries too.
>

True. Something like that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-02-15 17:18:51 Re: Synchronizing slots from primary to standby
Previous Message Peter Geoghegan 2024-02-15 16:42:07 Re: index prefetching