Quick Links

Re: Raid 10 chunksize

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Stef Telford <stef(at)ummon(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Raid 10 chunksize
Date:	2009-03-26 22:02:08
Message-ID:	C5F14970.3CA1%scott@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 3/26/09 2:44 PM, "Scott Carey" <scott(at)richrelevance(dot)com> wrote:

>
>
> On 3/25/09 9:43 PM, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz> wrote:
>
>> Stef Telford wrote:
>>>
>>> Hello Mark,
>>> Okay, so, take all of this with a pinch of salt, but, I have the
>>> same config (pretty much) as you, with checkpoint_Segments raised to
>>> 192. The 'test' database server is Q8300, 8GB ram, 2 x 7200rpm SATA
>>> into motherboard which I then lvm stripped together; lvcreate -n
>>> data_lv -i 2 -I 64 mylv -L 60G (expandable under lvm2). That gives me
>>> a stripe size of 64. Running pgbench with the same scaling factors;
>>>
>>> starting vacuum...end.
>>> transaction type: TPC-B (sort of)
>>> scaling factor: 100
>>> number of clients: 24
>>> number of transactions per client: 12000
>>> number of transactions actually processed: 288000/288000
>>> tps = 1398.907206 (including connections establishing)
>>> tps = 1399.233785 (excluding connections establishing)
>>>
>>> It's also running ext4dev, but, this is the 'playground' server,
>>> not the real iron (And I dread to do that on the real iron). In short,
>>> I think that chunksize/stripesize is killing you. Personally, I would
>>> go for 64 or 128 .. that's jst my 2c .. feel free to
>>> ignore/scorn/laugh as applicable ;)
>>>
>>>
>> Stef - I suspect that your (quite high) tps is because your SATA disks
>> are not honoring the fsync() request for each commit. SCSI/SAS disks
>> tend to by default flush their cache at fsync - ATA/SATA tend not to.
>> Some filesystems (e.g xfs) will try to work around this with write
>> barrier support, but it depends on the disk firmware.
>
> This has not been very true for a while now. SATA disks will flush their
> write cache when told, and properly adhere to write barriers. Of course,
> not all file systems send the right write barrier commands and flush
> commands to SATA drives (UFS for example, and older versions of ext3).
>
> It may be the other way around, your SAS drives might have the write cache
> disabled for no good reason other than to protect against file systems that
> don't work right.
>

A little extra info here >> md, LVM, and some other tools do not allow the
file system to use write barriers properly.... So those are on the bad list
for data integrity with SAS or SATA write caches without battery back-up.
However, this is NOT an issue on the postgres data partition. Data fsync
still works fine, its the file system journal that might have out-of-order
writes. For xlogs, write barriers are not important, only fsync() not
lying.

As an additional note, ext4 uses checksums per block in the journal, so it
is resistant to out of order writes causing trouble. The test compared to
here was on ext4, and most likely the speed increase is partly due to that.

>>
>> Thanks for your reply!
>>
>> Mark
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: Raid 10 chunksize at 2009-03-26 21:44:15 from Scott Carey

Responses

Re: Raid 10 chunksize at 2009-04-01 07:57:57 from Mark Kirkwood

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew Wakeling	2009-03-27 12:45:40	Re: Very specialised query
Previous Message	Scott Carey	2009-03-26 21:49:05	Re: Raid 10 chunksize