Re: Dynamic gathering the values for seq_page_cost/xxx_cost

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Date: 2020-09-26 00:16:52
Message-ID: CAKU4AWqx_6k9D+2Px-7kRv4SnX-OFCRo3Dyxy+CVJiejsJKgcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 25, 2020 at 5:15 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Tue, Sep 22, 2020 at 10:57 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> >
> >
> > My tools set the random_page_cost to 8.6, but based on the fio data, it
> should be
> > set to 12.3 on the same hardware. and I do see the better plan as well
> with 12.3.
> > Looks too smooth to believe it is true..
> >
> > The attached result_fio_mytool.tar.gz is my test result. You can use
> git show HEAD^^
> > is the original plan with 8.6. git show HEAD^ show the plan changes
> after we changed
> > the random_page_cost. git show HEAD shows the run time statistics
> changes for these queries.
> > I also uploaded the test tool [1] for this for your double check.
>
> The scripts seem to start and stop the server, drop caches for every
> query. That's where you are seeing that setting random_page_cost to
> fio based ratio provides better plans. But in practice, these costs
> need to be set on a server where the queries are run concurrently and
> repeatedly. That's where the caching behaviour plays an important

role. Can we write a tool which can recommend costs for that scenario?

I totally agree with this. Actually the first thing I did is to define a
proper IO workload. At the very beginning, I used DIRECT_IO for both seq
read
and random read on my SSD, and then found the result is pretty bad per
testing
(random_page_cost = ~1.6). then I realized postgresql relies on the
prefetch
which is disabled by DIRECT_IO. After I fixed this, I tested again with the
above
scenario (cache hit ratio = 0) to verify my IO model. Per testing, it looks
good.
I am also thinking if the random_page_cost = 1.1 doesn't provide a good
result
on my SSD because it ignores the prefects of seq read.

After I am OK with my IO model, I test with the way you see above. but
I also detect the latency for file system cache hit, which is handled by
get_fs_cache_latency_us in my code (I ignored the shared buffer hits for
now).
and allows user to provides a cache_hit_ratio, the final random_page_cost
= (real_random_lat) / real_seq_lat, where
real_xxx_lat = cache_hit_ratio * fs_cache_lat + (1 - cache_hit_ratio) *
xxx_lat.
See function cal_real_lat and cal_random_page_cost.

As for the testing with cache considered, I found how to estimate cache hit
ratio is hard or how to control a hit ratio to test is hard. Recently I am
thinking
a method that we can get a page_reads, shared_buffer_hit from pg_kernel
and the real io (without the file system cache hit) at os level (just as
what
iotop/pidstat do). then we can know the shared_buffer hit ratio and file
system
cache hit ratio (assume it will be stable after a long run). and then do a
testing.
However this would be another branch of manual work and I still have not got
it done until now.

I'd not like to share too many details, but "lucky" many cases I
have haven't file
system cache, that makes things a bit easier. What I am doing right now is
to
calculate the random_page_cost with the above algorithm with only
shared_buffer
considered. and test the real benefits with real workload to see how it
works.
If it works well, I think the only thing left is to handle file system
cache.

The testing is time consuming since I have to cooperate with many site
engineers,
so any improvement on the design will be much helpful.

> How do the fio based cost perform when the queries are run repeatedly?
>
>
That probably is not good since I have 280G+ file system cache and I have to
prepare much more than 280G data size for testing.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li Japin 2020-09-26 00:32:28 Re: Optimize memory allocation code
Previous Message Julien Rouhaud 2020-09-26 00:09:31 Re: Optimize memory allocation code