Re: Raid 10 chunksize

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-02 20:44:20
Message-ID: C5FA71B4.41F5%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 4/2/09 1:20 PM, "Scott Carey" <scott(at)richrelevance(dot)com> wrote:
>
> Well, raid block size can be significantly larger than postgres or file
> system block size and the performance of random reads / writes won't get
> worse with larger block sizes. This holds only for RAID 0 (or 10), parity
> is the ONLY thing that makes larger block sizes bad since there is a
> read-modify-write type operation on something the size of one block.
>
> Raid block sizes smaller than the postgres block is always bad and
> multiplies random i/o.
>
> Read a 8k postgres block in a 8MB md raid 0 block, and you read 8k from one
> disk.
> Read a 8k postgres block on a md raid 0 with 4k blocks, and you read 4k from
> two disks.
>

OK, one more thing. The 8k read In a 8MB block size raid array can generate
two reads in the following cases:

Your read is on the boundary of the blocks AND

1: your partition is not aligned with the raid blocks. This can happen if
you partition _inside_ the raid but not if you raid inside the partition
(the latter only being applicable to software raid).
OR
2: your file system block size is smaller than the postgres block size and
the file block offset is not postgres block aligned.

The likelihood of the first condition is proportional to:

(Postgres block size)/(raid block size)

Hence, for most all setups with software raid, a larger block size up to the
point where the above ratio gets sufficiently small is optimal. If the
block size gets too large, then random access is more and more likely to
bias towards one drive over the others and lower throughput.

Obviously, in the extreme case where the block size is the disk size, you
would have to randomly access 100% of all the data to get full speed.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2009-04-02 23:08:50 Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
Previous Message Scott Carey 2009-04-02 20:34:13 Re: Raid 10 chunksize