Re: Optimize kernel readahead using buffer access strategy

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: Optimize kernel readahead using buffer access strategy
Date: 2013-12-17 11:50:06
Message-ID: 52B03A6E.607@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I fixed the patch to improve followings.

- Can compile in MacOS.
- Change GUC name enable_kernel_readahead to readahead_strategy.
- Change POSIX_FADV_SEQUNENTIAL to POISX_FADV_NORMAL when we select sequential
access strategy, this reason is later...

I tested simple two access paterns which are followings in pgbench tables scale
size is 1000.

A) SELECT count(bid) FROM pgbench_accounts; (Index only scan)
B) SELECT count(bid) FROM pgbench_accounts; (Seq scan)

In each test, I restart postgres and drop file cache before each test.

Unpatched PG is faster than patched in A and B query. It was about 1.3 times
faster. Result of A query as expected, because patched PG cannot execute
readahead at all. So cache cold situation is bad for patched PG. However, it
might good for cache hot situation, because it doesn't read disk IO at all and
can calculate file cache usage and know which cache is important.

However, result of B query as unexpected, because my patch select
POSIX_FADV_SEQUNENTIAL collectry, but it slow. I cannot understand that,
nevertheless I read kernel source code... Next, I change POSIX_FADV_SEQUNENTIAL
to POISX_FADV_NORMAL in my patch. B query was faster as unpatched PG.

In heavily random access benchmark tests which are pgbench and DBT-2, my patched
PG is about 1.1 - 1.3 times faster than unpatched PG. But postgres buffer hint
strategy algorithm have not optimized for readahead strategy yet, and I don't fix
it. It is still only for ring buffer algorithm in shared_buffer.

Attached printf-debug patch will show you inside postgres buffer strategy. When
you see "S" it selects sequential access strategy, on the other hands, when you
see "R" it selects random access strategy. It might interesting for you. It's
very visual.

Example output is here.
> [mitsu-ko(at)localhost postgresql]$ bin/vacuumdb
> SSSSSSSSSSSSSSSSSSSSSSSSSSS~~SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
> [mitsu-ko(at)localhost postgresql]$ bin/psql -c "EXPLAIN ANALYZE SELECT count(aid) FROM pgbench_accounts"
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=2854.29..2854.30 rows=1 width=4) (actual time=33.438..33.438 rows=1 loops=1)
> -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.072..20.912 rows=100000 loops=1)
> Heap Fetches: 0
> Total runtime: 33.552 ms
> (4 rows)
>
> RRRRRRRRRRRRRRRRRRRRRRRRRRR~~RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
> [mitsu-ko(at)localhost postgresql]$ bin/psql -c "EXPLAIN ANALYZE SELECT count(bid) FROM pgbench_accounts"
> SSSSSSSSSSSSSSSSSSSSSSSSSSS~~SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
> ------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=2890.00..2890.01 rows=1 width=4) (actual time=40.315..40.315 rows=1 loops=1)
> -> Seq Scan on pgbench_accounts (cost=0.00..2640.00 rows=100000 width=4) (actual time=0.112..23.001 rows=100000 loops=1)
> Total runtime: 40.472 ms
> (3 rows)

Thats's all now.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachment Content-Type Size
optimizing_kernel-readahead_using_buffer-access-strategy_v4.patch text/x-diff 14.1 KB
optimizing_kernel-readahead_using_buffer-access-strategy_v4_printf-debug.patch text/x-diff 14.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2013-12-17 12:03:38 Re: [bug fix] pg_ctl always uses the same event source
Previous Message Dimitri Fontaine 2013-12-17 10:03:23 Re: Extension Templates S03E11