Re: Improvement of checkpoint IO scheduler for stable transaction responses

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improvement of checkpoint IO scheduler for stable transaction responses
Date: 2013-07-15 01:08:11
Message-ID: CAMkU=1zi=+mqpcBjzeHcho8bsBENA6Fr+pujNti+js+RqwptNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sunday, July 14, 2013, Greg Smith wrote:

> On 6/27/13 11:08 AM, Robert Haas wrote:
>
>> I'm pretty sure Greg Smith tried it the fixed-sleep thing before and
>> it didn't work that well.
>>
>
> That's correct, I spent about a year whipping that particular horse and
> submitted improvements on it to the community. http://www.postgresql.org/*
> *message-id/4D4F9A3D(dot)5070700(at)**2ndquadrant(dot)com<http://www.postgresql.org/message-id/4D4F9A3D.5070700@2ndquadrant.com>and its updates downthread are good ones to compare this current work
> against.
>
> The important thing to realize about just delaying fsync calls is that it
> *cannot* increase TPS throughput. Not possible in theory, obviously
> doesn't happen in practice. The most efficient way to write things out is
> to delay those writes as long as possible. The longer you postpone a
> write, the more elevator sorting and write combining you get out of the OS.
> This is why operating systems like Linux come tuned for such delayed
> writes in the first place. Throughput and latency are linked; any patch
> that aims to decrease latency will probably slow throughput.
>

Do common low level IO benchmarking tools cover this territory? I've
looked at Bonnie, which seems to be the most famous one, and it doesn't
look like it covers effectiveness of write combining at all.

I've done my own casual benchmarking, and the results were astonishingly
bad for the OS/FS. If I over-wrote 1024*1024 blocks of 8KB in random order
and then fsynced the 8GB of data (divided into 8x1GB files, in deference to
PG segment size) it took way longer than if I did the overwrite in block
order and then fsynced that. This was a gift-horse machine not speced out
to be a database server, but the linux kernel is still the kernel
regardless of the hardware it sits on so I don't how much that should
matter. To be clear, the writes did not take longer, it was the fsyncs
that took longer. All writes were successfully absorbed into memory
promptly. Alas, I no longer have access to a machine which can absorb 8GB
of writes into RAM without thinking twice and which I can use for casual
experimentation.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-07-15 01:08:12 Re: Improvement of checkpoint IO scheduler for stable transaction responses
Previous Message Tom Lane 2013-07-15 00:41:34 pg_memory_barrier() doesn't compile, let alone work, for me