Re: performance on new linux box

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Ben Chobot <bench(at)silentmedia(dot)com>, Craig James <craig_james(at)emolecules(dot)com>, "Timothy(dot)Noonan(at)emc(dot)com" <Timothy(dot)Noonan(at)emc(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance on new linux box
Date: 2010-07-16 03:42:20
Message-ID: 1E1BB935-72EF-4312-81B6-5E033D66BDA1@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Jul 15, 2010, at 6:22 PM, Scott Marlowe wrote:

> On Thu, Jul 15, 2010 at 10:30 AM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>>
>> On Jul 14, 2010, at 7:50 PM, Ben Chobot wrote:
>>
>>> On Jul 14, 2010, at 6:57 PM, Scott Carey wrote:
>>>
>>>> But none of this explains why a 4-disk raid 10 is slower than a 1 disk system. If there is no write-back caching on the RAID, it should still be similar to the one disk setup.
>>>
>>> Many raid controllers are smart enough to always turn off write caching on the drives, and also disable the feature on their own buffer without a BBU. Add a BBU, and the cache on the controller starts getting used, but *not* the cache on the drives.
>>
>> This does not make sense.
>
> Basically, you can have cheap, fast and dangerous (drive with write
> cache enabled, which responds positively to fsync even when it hasn't
> actually fsynced the data. You can have cheap, slow and safe with a
> drive that has a cache but since it'll be fsyncing it all the the time
> the write cache won't actually get used, or fast, expensive, and safe,
> which is what a BBU RAID card gets by saying the data is fsynced when
> it's actually just in cache, but a safe cache that won't get lost on
> power down.
>
> I don't find it that complicated.

It doesn't make sense that a raid 10 will be slower than a 1-disk setup unless the former respects fsync() and the latter does not. Individual drive write cache does not explain the situation. That is what does not make sense.

When in _write-through_ mode, there is no reason to turn off the drive's write cache unless the drive does not properly respect its cache-flush command, or the RAID card is too dumb to issue cache-flush commands. The RAID card simply has to issue its writes, then issue the flush commands, then return to the OS when those complete. With drive write caches on, this is perfectly safe. The only way it is unsafe is if the drive lies and returns from a cache flush before the data from its cache is actually flushed.

Some SSD's on the market currently lie. A handful of the thousands of all hard drive models in the server, desktop, and laptop space in the last decade did not respect the cache flush command properly, and none of them in the SAS/SCSI or 'enterprise SATA' space lie to my knowledge. Information on this topic has come across this list several times.

The explanation why one setup respects fsync() and another does not almost always lies in the FS + OS combination. HFS+ on OSX does not respect fsync. ext3 until recently only did fdatasync() when you told it to fsync() (which is fine for postgres' transaction log anyway).

A raid card, especially with any SAS/SCSI drives has no reason to turn off the drive's write cache unless it _wants_ to return to the OS before the data is on the drive. That condition occurs in write-back cache mode when the RAID card's cache is safe via a battery or some other mechanism. In that case, it should turn off the drive's write cache so that it can be sure that data is on disk when a power fails without having to call the cache-flush command on every write. That way, it can remove data from its RAM as soon as the drive returns from the write.
In write-through mode it should turn the caches back on and rely on the flush command to pass through direct writes, cache flush demands, and barrier requests. It could optionally turn the caches off, but that won't improve data safety unless the drive cannot faithfully flush its cache.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ben Chobot 2010-07-16 04:30:31 Re: performance on new linux box
Previous Message Scott Carey 2010-07-16 03:16:10 Re: performance on new linux box