Quick Links

Re: Incremental Sort Cost Estimation Instability

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Incremental Sort Cost Estimation Instability
Date:	2025-06-03 14:02:53
Message-ID:	CAPpHfduossatgSzBzTm-ENnzhiGEj-xE=pk8MK79XR2cm9=vDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Andrei!

On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 9/12/24 16:57, Tomas Vondra wrote:
> > On 9/12/24 12:12, David Rowley wrote:
> >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >>> Initial problem causes wrong cost_sort estimation. Right now I think
> >>> about providing cost_sort() the sort clauses instead of (or in addition
> >>> to) the pathkeys.
> >>
> >> I'm not quite sure why the sort clauses matter any more than the
> >> EquivalenceClass. If the EquivalanceClass defines that all members
> >> will have the same value for any given row, then, if we had to choose
> >> any single member to drive the n_distinct estimate from, isn't the
> >> most accurate distinct estimate from the member with the smallest
> >> n_distinct estimate? (That assumes the less distinct member has every
> >> value the more distinct member has, which might not be true)
> > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > to do things differently. And I agree looking for the minimum ndistinct
> > seems like the right approach. but doesn't estimate_num_groups()
> > supposed to already do that? The comment says:
> Hi,
>
> I have rewritten the code according to the idea of the
> estimate_num_groups responsibility to adjust estimation according to the EC.
> I haven't changed all the places of ngroups estimation - only where the
> planner can access pathkeys to avoid the overhead of passing through all
> the ECs. And added some cache in the EM.
> The most important case for me here is GROUP-BY estimation and
> create_unique_path because I frequently see them as sides of a JOIN.
> It would be also nice to adjust the cost of memoize rescan, but it may
> be a subject of the future improvement.

Thank you for your work on this subject. I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's. I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0". Should we do this in cost_incremental_sort() as before? I see
that it currently analyzes only first EquivalenceMember. But could it
be a different answers between first EquivalenceMember and others?

------
Regards,
Alexander Korotkov
Supabase

In response to

Re: Incremental Sort Cost Estimation Instability at 2025-05-14 10:50:46 from Andrei Lepikhov

Responses

Re: Incremental Sort Cost Estimation Instability at 2025-06-04 21:26:17 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2025-06-03 14:05:13	Re: MergeAppend could consider sorting cheapest child path
Previous Message	Tom Lane	2025-06-03 14:01:58	Re: C11 / VS 2019