Re: Raid 10 chunksize

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Scott Carey <scott(at)richrelevance(dot)com>, Stef Telford <stef(at)ummon(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-03 00:10:10
Message-ID: 49D553E2.8000405@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Smith wrote:
> On Wed, 1 Apr 2009, Scott Carey wrote:
>
>> Write caching on SATA is totally fine. There were some old ATA drives
>> that when paried with some file systems or OS's would not be safe. There are
>> some combinations that have unsafe write barriers. But there is a
>> standard
>> well supported ATA command to sync and only return after the data is on
>> disk. If you are running an OS that is anything recent at all, and any
>> disks that are not really old, you're fine.
>
> While I would like to believe this, I don't trust any claims in this
> area that don't have matching tests that demonstrate things working as
> expected. And I've never seen this work.
>
> My laptop has a 7200 RPM drive, which means that if fsync is being
> passed through to the disk correctly I can only fsync <120
> times/second. Here's what I get when I run sysbench on it, starting
> with the default ext3 configuration:

I believe it's ext3 who's cheating in this scenario.

Any chance you can test the program I posted here that
tweaks the inode before the fsync:
http://archives.postgresql.org//pgsql-general/2009-03/msg00703.php

On my system with the fchmod's in that program I was getting one
fsync per disk revolution. Without the fchmod's, fsync() didn't
wait at all.

This was the case on dozens of drives I tried, dating back to
old PATA drives from 2000. Only drives from last century didn't
behave that way - but I can't accuse them of lying because
hdparm showed that they didn't claim to support FLUSH_CACHE.

I think this program shows that practically all hard drives are
physically capable of doing a proper fsync; but annoyingly
ext3 refuses to send the FLUSH_CACHE commands to the drive
unless the inode changed.

> $ uname -a
> Linux gsmith-t500 2.6.28-11-generic #38-Ubuntu SMP Fri Mar 27 09:00:52
> UTC 2009 i686 GNU/Linux
>
> $ mount
> /dev/sda3 on / type ext3 (rw,relatime,errors=remount-ro)
>
> $ sudo hdparm -I /dev/sda | grep FLUSH
> * Mandatory FLUSH_CACHE
> * FLUSH_CACHE_EXT
>
> $ ~/sysbench-0.4.8/sysbench/sysbench --test=fileio --file-fsync-freq=1
> --file-num=1 --file-total-size=16384 --file-test-mode=rndwr run
> sysbench v0.4.8: multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 1
>
> Extra file open flags: 0
> 1 files, 16Kb each
> 16Kb total file size
> Block size 16Kb
> Number of random requests for random IO: 10000
> Read/Write ratio for combined random IO test: 1.50
> Periodic FSYNC enabled, calling fsync() each 1 requests.
> Calling fsync() at the end of test, Enabled.
> Using synchronous I/O mode
> Doing random write test
> Threads started!
> Done.
>
> Operations performed: 0 Read, 10000 Write, 10000 Other = 20000 Total
> Read 0b Written 156.25Mb Total transferred 156.25Mb (39.176Mb/sec)
> 2507.29 Requests/sec executed
>
>
> OK, that's clearly cached writes where the drive is lying about fsync.
> The claim is that since my drive supports both the flush calls, I just
> need to turn on barrier support, right?
>
> [Edit /etc/fstab to remount with barriers]
>
> $ mount
> /dev/sda3 on / type ext3 (rw,relatime,errors=remount-ro,barrier=1)
>
> [sysbench again]
>
> 2612.74 Requests/sec executed
>
> -----
>
> This is basically how this always works for me: somebody claims
> barriers and/or SATA disks work now, no really this time. I test, they
> give answers that aren't possible if fsync were working properly, I
> conclude turning off the write cache is just as necessary as it always
> was. If you can suggest something wrong with how I'm testing here, I'd
> love to hear about it. I'd like to believe you but I can't seem to
> produce any evidence that supports you claims here.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hannes Dorbath 2009-04-03 08:19:38 Re: Raid 10 chunksize
Previous Message Bruce Momjian 2009-04-02 23:08:50 Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4