Re: Dynamic gathering the values for seq_page_cost/xxx_cost

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date: 2019-11-27 16:48:21
Message-ID: 20191127164821.lspxyrf3c5r6zu5n@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 26, 2019 at 08:59:22AM +0800, Andy Fan wrote:
>The optimizer cost model usually needs 2 inputs, one is used to represent
>data distribution and the other one is used to represent the capacity of
>the hardware, like cpu/io let's call this one as system stats.
>
>In Oracle database, the system stats can be gathered with
>dbms_stats.gather_system_stats [1] on the running hardware, In
>postgresql, the value is set on based on experience (user can change the
>value as well, but is should be hard to decide which values they should
>use). The pg way is not perfect in theory(In practice, it may be good
>enough or not). for example, HDD & SSD have different capacity regards to
>seq_scan_cost/random_page_cost, cpu cost may also different on different
>hardware as well.
>
>I run into a paper [2] which did some research on dynamic gathering the
>values for xxx_cost, looks it is interesting. However it doesn't provide
>the code for others to do more research. before I dive into this, It
>would be great to hear some suggestion from experts.
>
>so what do you think about this method and have we have some discussion
>about this before and the result?
>

IMHO it would be great to have a tool that helps with tuning those
parameters, particularly random_page_cost. I'm not sure how feasible it
is, though, but if you're willing to do some initial experiments and
research, I think it's worth looking into.

It's going to be challenging, though, because even random_page_cost=4
mismatches the "raw" characteristics on any existing hardware. On old
drives the sequential/random difference is way worse, on SSDs it's about
right. But then again, we know random_page_cost=1.5 or so works mostly
fine on SSDs, and that's much lower than just raw numbers.

So it's clearly one thing to measure HW capabilities, and it's another
thing to conclude what the parameters should be ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-11-27 17:43:25 Re: [HACKERS] Block level parallel vacuum
Previous Message Tom Lane 2019-11-27 16:09:44 Re: Invisible PROMPT2