Re: Incremental Sort Cost Estimation Instability

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental Sort Cost Estimation Instability
Date: 2025-05-14 10:50:46
Message-ID: 2a3c1c60-b38a-4a7f-83ec-940e4abbb6ac@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/12/24 16:57, Tomas Vondra wrote:
> On 9/12/24 12:12, David Rowley wrote:
>> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>> Initial problem causes wrong cost_sort estimation. Right now I think
>>> about providing cost_sort() the sort clauses instead of (or in addition
>>> to) the pathkeys.
>>
>> I'm not quite sure why the sort clauses matter any more than the
>> EquivalenceClass. If the EquivalanceClass defines that all members
>> will have the same value for any given row, then, if we had to choose
>> any single member to drive the n_distinct estimate from, isn't the
>> most accurate distinct estimate from the member with the smallest
>> n_distinct estimate? (That assumes the less distinct member has every
>> value the more distinct member has, which might not be true)
> I'm not sure how to fix this, but it seems estimate_num_groups() needs
> to do things differently. And I agree looking for the minimum ndistinct
> seems like the right approach. but doesn't estimate_num_groups()
> supposed to already do that? The comment says:
Hi,

I have rewritten the code according to the idea of the
estimate_num_groups responsibility to adjust estimation according to the EC.
I haven't changed all the places of ngroups estimation - only where the
planner can access pathkeys to avoid the overhead of passing through all
the ECs. And added some cache in the EM.
The most important case for me here is GROUP-BY estimation and
create_unique_path because I frequently see them as sides of a JOIN.
It would be also nice to adjust the cost of memoize rescan, but it may
be a subject of the future improvement.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
v0-0001-Employ-EquivalenceClass-to-adjust-ndistinct-estim.patch text/x-patch 25.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florents Tselai 2025-05-14 15:33:18 Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
Previous Message John Naylor 2025-05-14 10:36:27 vectorized CRC on ARM64