Re: Improve planner cost estimations for alternative subplans

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Alexey Bashtanov <bashtanov(at)imap(dot)cc>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve planner cost estimations for alternative subplans
Date: 2020-06-20 23:30:30
Message-ID: 20200620233030.jcxdjd6njwaajzrr@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 17, 2020 at 06:21:58PM -0700, Melanie Plageman wrote:
>On Fri, Jun 5, 2020 at 9:08 AM Alexey Bashtanov <bashtanov(at)imap(dot)cc> wrote:
>
>>
>> In [1] we found a situation where it leads to a suboptimal plan,
>> as it bloats the overall cost into large figures,
>> a decision related to an outer part of the plan look negligible to the
>> planner,
>> and as a result it doesn't elaborate on choosing the optimal one.
>>
>>
>Did this geometric average method result in choosing the desired plan for
>this case?
>
>
>> The patch is to fix it. Our linear model for costs cannot quite accommodate
>> the piecewise linear matter of alternative subplans,
>> so it is based on ugly heuristics and still cannot be very precise,
>> but I think it's better than the current one.
>>
>> Thoughts?
>>
>>
>Is there another place in planner where two alternatives are averaged
>together and that cost is used?
>
>To me, it feels a little bit weird that we are averaging together the
>startup cost of a plan which will always have a 0 startup cost and a
>plan that will always have a non-zero startup cost and the per tuple
>cost of a plan that will always have a negligible per tuple cost and one
>that might have a very large per tuple cost.
>
>I guess it feels different because instead of comparing alternatives you
>are blending them.
>
>I don't have any academic basis for saying that the alternatives costs
>shouldn't be averaged together for use in the rest of the plan, so I
>could definitely be wrong.
>

I agree it feels weird. Even if it actually improved the problematic
case, I think it'll be quite hard to convince ourselves this helps in
general. For example, for cases that actually end up using the first
plan, this is bound to make the estimates worse. I find it hard to
believe it won't cause regressions in at least some cases.

Maybe this heuristics really is better than the old one, but I think we
need to understand why - a single query probably is not enough.

I think the crucial limitation here is that we don't know which of the
alternative plans will be used. Is there a chance to improve this,
perhaps by making some sort of guess?

I'm not particularly familiar with AlternativeSubPlans, but I see we're
picking the one in nodeSubplan.c based on plan_rows. Can't we do the
same thing in cost_qual_eval_walker?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-06-21 00:08:37 Re: pg_regress cleans up tablespace twice.
Previous Message Justin Pryzby 2020-06-20 23:28:16 Re: Operator class parameters and sgml docs