Re: Sorting writes during checkpoint

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting writes during checkpoint
Date: 2008-07-16 05:19:22
Message-ID: Pine.GSO.4.64.0807092139390.8953@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:

> I will have a plan to test it on RAID-5 disks, where sequential writing
> are much better than random writing. I'll send the result as an evidence.

If you're running more tests here, please turn on log_checkpoints and
collect the logs while the test is running. I'm really curious if there's
any significant difference in what that reports here in the sorted case
vs. the regular one.

> Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once.
> With sorted writes, we can call fsync() segment-by-segment for each
> writes of dirty pages contained in the segment. It could improve worst
> response time during checkpoints.

Further decreasing the amount of data that is fsync'd at any point in time
might be a bigger improvement than just the sorting itself is doing (so
far I haven't seen anything really significant just from the sort but am
still testing).

One thing I didn't see any comments from you on is how/if the sorted
writes patch lowers worst-case latency. That's the area I'd hope an
improved fsync protocol would help most with, rather than TPS, which might
even go backwards because writes won't be as bunched and therefore will
have more seeking. It's easy enough to analyze the data coming from
"pgbench -l" to figure that out; example shell snipped that shows just the
worst ones:

pgbench -l -N <db>
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some
examples of that on my web page you may have seen before.

> In addition, the current smgr layer is completely useless because
> it cannot be extended dynamically and cannot handle multiple md-layer
> modules. I would rather merge current smgr and part of bufmgr into
> a new smgr and add smgr_hook() than bulk_io_hook().

I don't really have a firm opinion here about the code to comment on this
specific suggestion, but I will say that I've found the amount of layering
in this area makes it difficult to understand just what's going on
sometimes (especially when new to it). A lot of that abstraction felt a
bit pass-through to me, and anything that would collapse that a bit would
be helpful for streamlining the code instrumenting going on with things
like dtrace.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2008-07-16 05:22:57 Re: [PATCHES] WITH RECURSIVE updated to CVS TIP
Previous Message Tatsuo Ishii 2008-07-16 04:57:04 Re: [PATCHES] WITH RECURSIVE updated to CVS TIP

Browse pgsql-patches by date

  From Date Subject
Next Message David Fetter 2008-07-16 05:22:57 Re: [PATCHES] WITH RECURSIVE updated to CVS TIP
Previous Message Tatsuo Ishii 2008-07-16 04:57:04 Re: [PATCHES] WITH RECURSIVE updated to CVS TIP