Re: verbose cost estimate

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: verbose cost estimate
Date: 2019-12-09 23:25:46
Message-ID: 20191209232546.kfv2tojn42g3ccjq@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 09, 2019 at 05:27:01PM -0500, Greg Stark wrote:
>On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote:
>> >Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> >> Jeff said:
>> >>> |What would I find very useful is a verbosity option to get the cost
>> >>> |estimates expressed as a multiplier of each *_cost parameter, rather than
>> >>> |just as a scalar.
>> >
>> >> It seems to me that's "just" a matter of redefining Cost and fixing everything that breaks:
>> >
>> >> struct Cost {
>> >> double seq, rand;
>> >> double cpu_tuple, cpu_index_tuple, cpu_oper;
>> >> double parallel_setup; // This is probably always in startup_cost and never in run_cost
>> >> double parallel_tuple; // This is probably always in run_cost and never in startup_cost
>> >> double disable;
>> >> };
>> >
>> >> I'm perhaps 50% done with that - is there some agreement that's a desirable
>> >> goal and a good way to do it ?
>> >
>> >No, I think this will get rejected out of hand. The implications for
>> >the planner's speed and memory consumption seem quite unacceptable
>> >for the size of the benefit. What you're showing above probably
>> >doubles the size of most Paths, and the added cycles in hot-spots
>> >like add_path seem pretty daunting.
>> >
>>
>> Yeah, that's an issue. But I have to admit my main issue with this
>> proposal is that I have no idea how I'd interpret this Cost. I mean,
>> what do the fields express for different types of paths? How do they
>> contribute to the actual cost of that path?
>
>What I think users would be able to do with this info is understand
>which parameter to tweak to raise the estimated cost of the node.
>
>Everyone knows if you see a index scan is being used but is taking
>longer than a sequential scan then you might try raising
>random_page_cost. But I rarely see people tweaking the more "exotic"
>parameters like operator_tuple_cost or index_tuple_cost and when they
>do they aren't really sure what nodes they're affecting...
>

Well, but that's kinda my point - how would you know that you need to
increase random_page_cost, or how big influence it has? The total is a
fairly non-trivial combination of various cost parameters, effective
cache size etc. Maybe I just don't understand how the cost is split into
those pieces, named the same as the cost GUCs ...

>I remember planning to do a very similar thing back in the 8.3 era and
>never getting around to it. You could imaging even storing these for
>the overall plan in the logs and building a large matrix of actual
>execution values versus these broken out individual costs. Then it
>becomes a standard linear optimization problem to find the optimal
>values for each parameter to minimize inaccurate plan estimates (and
>to identify cases where there are outliers).
>

Maybe, but that's for one query. If you do this for many queries, the
results may be easily contradicting, no?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-12-09 23:31:39 Re: verbose cost estimate
Previous Message Karl O. Pinc 2019-12-09 23:03:43 Re: proposal: minscale, rtrim, btrim functions for numeric