Re: Raid 10 chunksize

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Stef Telford <stef(at)ummon(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-03-26 22:02:08
Message-ID: C5F14970.3CA1%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 3/26/09 2:44 PM, "Scott Carey" <scott(at)richrelevance(dot)com> wrote:

>
>
> On 3/25/09 9:43 PM, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz> wrote:
>
>> Stef Telford wrote:
>>>
>>> Hello Mark,
>>> Okay, so, take all of this with a pinch of salt, but, I have the
>>> same config (pretty much) as you, with checkpoint_Segments raised to
>>> 192. The 'test' database server is Q8300, 8GB ram, 2 x 7200rpm SATA
>>> into motherboard which I then lvm stripped together; lvcreate -n
>>> data_lv -i 2 -I 64 mylv -L 60G (expandable under lvm2). That gives me
>>> a stripe size of 64. Running pgbench with the same scaling factors;
>>>
>>> starting vacuum...end.
>>> transaction type: TPC-B (sort of)
>>> scaling factor: 100
>>> number of clients: 24
>>> number of transactions per client: 12000
>>> number of transactions actually processed: 288000/288000
>>> tps = 1398.907206 (including connections establishing)
>>> tps = 1399.233785 (excluding connections establishing)
>>>
>>> It's also running ext4dev, but, this is the 'playground' server,
>>> not the real iron (And I dread to do that on the real iron). In short,
>>> I think that chunksize/stripesize is killing you. Personally, I would
>>> go for 64 or 128 .. that's jst my 2c .. feel free to
>>> ignore/scorn/laugh as applicable ;)
>>>
>>>
>> Stef - I suspect that your (quite high) tps is because your SATA disks
>> are not honoring the fsync() request for each commit. SCSI/SAS disks
>> tend to by default flush their cache at fsync - ATA/SATA tend not to.
>> Some filesystems (e.g xfs) will try to work around this with write
>> barrier support, but it depends on the disk firmware.
>
> This has not been very true for a while now. SATA disks will flush their
> write cache when told, and properly adhere to write barriers. Of course,
> not all file systems send the right write barrier commands and flush
> commands to SATA drives (UFS for example, and older versions of ext3).
>
> It may be the other way around, your SAS drives might have the write cache
> disabled for no good reason other than to protect against file systems that
> don't work right.
>

A little extra info here >> md, LVM, and some other tools do not allow the
file system to use write barriers properly.... So those are on the bad list
for data integrity with SAS or SATA write caches without battery back-up.
However, this is NOT an issue on the postgres data partition. Data fsync
still works fine, its the file system journal that might have out-of-order
writes. For xlogs, write barriers are not important, only fsync() not
lying.

As an additional note, ext4 uses checksums per block in the journal, so it
is resistant to out of order writes causing trouble. The test compared to
here was on ext4, and most likely the speed increase is partly due to that.

>>
>> Thanks for your reply!
>>
>> Mark
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-03-27 12:45:40 Re: Very specialised query
Previous Message Scott Carey 2009-03-26 21:49:05 Re: Raid 10 chunksize