Re: Dynamic gathering the values for seq_page_cost/xxx_cost

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date: 2020-09-18 13:28:10
Message-ID: CAKU4AWpRv50k8E3tC3tiLWGe2DbKaoZricRh_YJ8y_zK+HdSjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 28, 2019 at 12:48 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Tue, Nov 26, 2019 at 08:59:22AM +0800, Andy Fan wrote:
> >The optimizer cost model usually needs 2 inputs, one is used to represent
> >data distribution and the other one is used to represent the capacity of
> >the hardware, like cpu/io let's call this one as system stats.
> >
> >In Oracle database, the system stats can be gathered with
> >dbms_stats.gather_system_stats [1] on the running hardware, In
> >postgresql, the value is set on based on experience (user can change the
> >value as well, but is should be hard to decide which values they should
> >use). The pg way is not perfect in theory(In practice, it may be good
> >enough or not). for example, HDD & SSD have different capacity regards
> to
> >seq_scan_cost/random_page_cost, cpu cost may also different on different
> >hardware as well.
> >
> >I run into a paper [2] which did some research on dynamic gathering the
> >values for xxx_cost, looks it is interesting. However it doesn't provide
> >the code for others to do more research. before I dive into this, It
> >would be great to hear some suggestion from experts.
> >
> >so what do you think about this method and have we have some discussion
> >about this before and the result?
> >
>
> IMHO it would be great to have a tool that helps with tuning those
> parameters, particularly random_page_cost. I'm not sure how feasible it
> is, though, but if you're willing to do some initial experiments and
> research, I think it's worth looking into.
>
> It's going to be challenging, though, because even random_page_cost=4
> mismatches the "raw" characteristics on any existing hardware. On old
> drives the sequential/random difference is way worse, on SSDs it's about
> right. But then again, we know random_page_cost=1.5 or so works mostly
> fine on SSDs, and that's much lower than just raw numbers.
>
> So it's clearly one thing to measure HW capabilities, and it's another
> thing to conclude what the parameters should be ...
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

I recently tried something in this direction and the result looks
promising based on my limited test.

Since the unit of a xxx_cost is "seq_page_cost", then how to detect
seq_page_cost is important. In the cost model, the IO cost of a seqscan is
rel->pages * seq_page_cost, it doesn't consider any cache (file system
cache or
shared buffer cache). However, it assumes the OS will prefetch the IO. So
to
detect the seq_page_cost, I enabled the prefetch but avoided the file system
cache. I tested this with 1). drop the cache on the file system. 2). Open
the test
file without O_DIRECT so that the prefetch can work.

To detect the random page read, I read it with pread with a random offset.
Since the random offsets may be the same as each other during the test,
so even dropping the file system cache at the beginning doesn't work. so
I open it with the O_DIRECT option.

I also measure the cost of reading a page from a file system cache, during
my test, it is about 10% of a seq scan read.

After I get the basic numbers about the hardware capability, I let the user
provide a cache hit ratio (This is a place where we can further improve if
this
is a right direction).

Here is the test result on my hardware.

fs_cache_lat = 0.832025us, seq_read_lat = 8.570290us, random_page_lat =
73.987732us

cache hit ratio: 1.000000 random_page_cost 1.000000
cache hit ratio: 0.900000 random_page_cost 5.073692
cache hit ratio: 0.500000 random_page_cost 7.957589
cache hit ratio: 0.100000 random_page_cost 8.551591
cache hit ratio: 0.000000 random_page_cost 8.633049

Then I tested the suggested value with the 10GB TPCH
workload. I compared the plans with 2 different settings random_page_cost =
1). 4 is the default value) 2). 8.6 the cache hint ratio = 0 one. Then 11
out of the 22
queries generated a different plan. At last I drop the cache (including
both
file system cache and shared_buffer) before run each query and run the 11
queries
under the 2 different settings. The execution time is below.

| | random_page_cost=4 | random_page_cost=8.6 |
|-----+--------------------+----------------------|
| Q1 | 1425.964 | 1121.928 |
| Q2 | 2553.072 | 2567.450 |
| Q5 | 4397.514 | 1475.343 |
| Q6 | 12576.985 | 4622.503 |
| Q7 | 3459.777 | 2987.241 |
| Q8 | 8360.995 | 8415.311 |
| Q9 | 4661.842 | 2930.370 |
| Q11 | 4885.289 | 2348.541 |
| Q13 | 2610.937 | 1497.776 |
| Q20 | 13218.122 | 10985.738 |
| Q21 | 264.639 | 262.350 |

The attached main.c is the program I used to detect the
random_page_cost. result.tar.gz is the test result, you can run a git log
first
to see the difference on plan or execution stat.

Any feedback is welcome. Thanks!

--
Best Regards
Andy Fan

Attachment Content-Type Size
main.c application/octet-stream 5.8 KB
result.tar.gz application/x-gzip 102.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-09-18 13:37:21 Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Previous Message Ajin Cherian 2020-09-18 12:49:46 Re: [HACKERS] logical decoding of two-phase transactions