From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Incremental Sort Cost Estimation Instability |
Date: | 2025-06-03 14:02:53 |
Message-ID: | CAPpHfduossatgSzBzTm-ENnzhiGEj-xE=pk8MK79XR2cm9=vDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Andrei!
On Wed, May 14, 2025 at 1:50 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 9/12/24 16:57, Tomas Vondra wrote:
> > On 9/12/24 12:12, David Rowley wrote:
> >> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >>> Initial problem causes wrong cost_sort estimation. Right now I think
> >>> about providing cost_sort() the sort clauses instead of (or in addition
> >>> to) the pathkeys.
> >>
> >> I'm not quite sure why the sort clauses matter any more than the
> >> EquivalenceClass. If the EquivalanceClass defines that all members
> >> will have the same value for any given row, then, if we had to choose
> >> any single member to drive the n_distinct estimate from, isn't the
> >> most accurate distinct estimate from the member with the smallest
> >> n_distinct estimate? (That assumes the less distinct member has every
> >> value the more distinct member has, which might not be true)
> > I'm not sure how to fix this, but it seems estimate_num_groups() needs
> > to do things differently. And I agree looking for the minimum ndistinct
> > seems like the right approach. but doesn't estimate_num_groups()
> > supposed to already do that? The comment says:
> Hi,
>
> I have rewritten the code according to the idea of the
> estimate_num_groups responsibility to adjust estimation according to the EC.
> I haven't changed all the places of ngroups estimation - only where the
> planner can access pathkeys to avoid the overhead of passing through all
> the ECs. And added some cache in the EM.
> The most important case for me here is GROUP-BY estimation and
> create_unique_path because I frequently see them as sides of a JOIN.
> It would be also nice to adjust the cost of memoize rescan, but it may
> be a subject of the future improvement.
Thank you for your work on this subject. I've the following notes
about the last patch.
1. Now groupExprs argument of estimate_num_groups() might contain
either Expr's or Pathkey's. I think this should be documented.
2. A new argument relids of estimate_num_groups() also should be documented.
3. I doubt estimate_num_groups() is appropriate place to handle "varno
0". Should we do this in cost_incremental_sort() as before? I see
that it currently analyzes only first EquivalenceMember. But could it
be a different answers between first EquivalenceMember and others?
------
Regards,
Alexander Korotkov
Supabase
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2025-06-03 14:05:13 | Re: MergeAppend could consider sorting cheapest child path |
Previous Message | Tom Lane | 2025-06-03 14:01:58 | Re: C11 / VS 2019 |