Re: verbose cost estimate

From: Greg Stark <stark(at)mit(dot)edu>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: verbose cost estimate
Date: 2019-12-09 22:27:01
Message-ID: CAM-w4HO8FZckhf1AAzx=8RzRcJSJ3jDg_kz+K4yvE9JgDrgHEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote:
> >Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> >> Jeff said:
> >>> |What would I find very useful is a verbosity option to get the cost
> >>> |estimates expressed as a multiplier of each *_cost parameter, rather than
> >>> |just as a scalar.
> >
> >> It seems to me that's "just" a matter of redefining Cost and fixing everything that breaks:
> >
> >> struct Cost {
> >> double seq, rand;
> >> double cpu_tuple, cpu_index_tuple, cpu_oper;
> >> double parallel_setup; // This is probably always in startup_cost and never in run_cost
> >> double parallel_tuple; // This is probably always in run_cost and never in startup_cost
> >> double disable;
> >> };
> >
> >> I'm perhaps 50% done with that - is there some agreement that's a desirable
> >> goal and a good way to do it ?
> >
> >No, I think this will get rejected out of hand. The implications for
> >the planner's speed and memory consumption seem quite unacceptable
> >for the size of the benefit. What you're showing above probably
> >doubles the size of most Paths, and the added cycles in hot-spots
> >like add_path seem pretty daunting.
> >
>
> Yeah, that's an issue. But I have to admit my main issue with this
> proposal is that I have no idea how I'd interpret this Cost. I mean,
> what do the fields express for different types of paths? How do they
> contribute to the actual cost of that path?

What I think users would be able to do with this info is understand
which parameter to tweak to raise the estimated cost of the node.

Everyone knows if you see a index scan is being used but is taking
longer than a sequential scan then you might try raising
random_page_cost. But I rarely see people tweaking the more "exotic"
parameters like operator_tuple_cost or index_tuple_cost and when they
do they aren't really sure what nodes they're affecting...

I remember planning to do a very similar thing back in the 8.3 era and
never getting around to it. You could imaging even storing these for
the overall plan in the logs and building a large matrix of actual
execution values versus these broken out individual costs. Then it
becomes a standard linear optimization problem to find the optimal
values for each parameter to minimize inaccurate plan estimates (and
to identify cases where there are outliers).

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-09 22:32:48 Re: remove support for old Python versions
Previous Message Tom Lane 2019-12-09 22:22:39 Re: Unicode normalization test broken output