Re: partition question for new server setup

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Whit Armstrong <armstrong(dot)whit(at)gmail(dot)com>
Cc: Craig James <craig_james(at)emolecules(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: partition question for new server setup
Date: 2009-04-29 01:58:51
Message-ID: C61D026B.5640%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 4/28/09 5:10 PM, "Whit Armstrong" <armstrong(dot)whit(at)gmail(dot)com> wrote:

> Thanks, Scott.
>
> So far, I've followed a pattern similar to Scott Marlowe's setup. I
> have configured 2 disks as a RAID 1 volume, and 4 disks as a RAID 10
> volume. So, the OS and xlogs will live on the RAID 1 vol and the data
> will live on the RAID 10 vol.
>
> I'm running the memtest on it now, so we still haven't locked
> ourselves into any choices.
>

Its a fine option -- the only way to know if one big volume with separate
partitions is better is to test your actual application since it is highly
dependant on the use case.

> regarding your comment:
>> 6 and 8 disk counts are tough. My biggest single piece of advise is to have
>> the xlogs in a partition separate from the data (not necessarily a different
>> raid logical volume), with file system and mount options tuned for each case
>> separately. I've seen this alone improve performance by a factor of 2.5 on
>> some file system / storage combinations.
>
> can you suggest mount options for the various partitions? I'm leaning
> towards xfs for the filesystem format unless someone complains loudly
> about data corruption on xfs for a recent 2.6 kernel.
>
> -Whit
>

I went with ext3 for the OS -- it makes Ops feel a lot better. ext2 for a
separate xlogs partition, and xfs for the data.
ext2's drawbacks are not relevant for a small partition with just xlog data,
but are a problem for the OS.

For a setup like yours xlog speed is not going to limit you.
I suggest a partition for the OS with default ext3 mount options, and a
second partition for postgres/xlogs minus the data on ext3 with
data=writeback.

ext3 with default data=ordered on the xlogs causes performance issues as
others have mentioned here. But data=ordered is probably the right thing
for the OS. Your xlogs will not be a bottleneck and will probably be fine
either way -- and this is a mount-time option so you can switch.

I went with xfs for the data partition, and did not see benefit from
anything other than the 'noatime' mount option. The default xfs settings
are fine, and the raid specific formatting options are primarily designed to
help raid 5 or 6 out.
If you go with ext3 for the data partition, make sure its data=writeback
with 'noatime'. Both of these are mount time options.

I said it before, but I'll repeat -- don't neglect the OS readahead setting
for the device, especially the data device.
Something like:
/sbin/blockdev --setra 8192 /dev/sd<X>
Where <X> is the right letter for your data raid volume
Will have a big impact on larger sequential scans. This has to go in
rc.local or whatever script runs after boot on your distro.

>
> On Tue, Apr 28, 2009 at 7:58 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>>>>
>>>> server information:
>>>> Dell PowerEdge 2970, 8 core Opteron 2384
>>>> 6 1TB hard drives with a PERC 6i
>>>> 64GB of ram
>>>
>>> We're running a similar configuration: PowerEdge 8 core, PERC 6i, but we
>>> have
>>> 8 of the 2.5" 10K 384GB disks.
>>>
>>> When I asked the same question on this forum, I was advised to just put all
>>> 8
>>> disks into a single RAID 10, and forget about separating things.  The
>>> performance of a battery-backed PERC 6i (you did get a battery-backed cache,
>>> right?) with 8 disks is quite good.
>>>
>>> In order to separate the logs, OS and data, I'd have to split off at least
>>> two
>>> of the 8 disks, leaving only six for the RAID 10 array.  But then my xlogs
>>> would be on a single disk, which might not be safe.  A more robust approach
>>> would be to split off four of the disks, put the OS on a RAID 1, the xlog on
>>> a
>>> RAID 1, and the database data on a 4-disk RAID 10.  Now I've separated the
>>> data, but my primary partition has lost half its disks.
>>>
>>> So, I took the advice, and just made one giant 8-disk RAID 10, and I'm very
>>> happy with it.  It has everything: Postgres, OS and logs.  But since the
>>> RAID
>>> array is 8 disks instead of 4, the net performance seems to quite good.
>>>
>>
>> If you go this route, there are a few risks:
>> 1.  If everything is on the same partition/file system, fsyncs from the
>> xlogs may cross-pollute to the data.  Ext3 is notorious for this, though
>> data=writeback limits the effect you especially might not want
>> data=writeback on your OS partition.  I would recommend that the OS, Data,
>> and xlogs + etc live on three different partitions regardless of the number
>> of logical RAID volumes.
>> 2. Cheap raid controllers (PERC, others) will see fsync for an array and
>> flush everything that is dirty (not just the partition or file data), which
>> is a concern if you aren't using it in write-back with battery backed cache,
>> even for a very read heavy db that doesn't need high fsync speed for
>> transactions.
>>
>>> But ... your mileage may vary.  My box has just one thing running on it:
>>> Postgres.  There is almost no other disk activity to interfere with the
>>> file-system caching.  If your server is going to have a bunch of other
>>> activity that generate a lot of non-Postgres disk activity, then this advice
>>> might not apply.
>>>
>>> Craig
>>>
>>
>> 6 and 8 disk counts are tough.  My biggest single piece of advise is to have
>> the xlogs in a partition separate from the data (not necessarily a different
>> raid logical volume), with file system and mount options tuned for each case
>> separately.  I've seen this alone improve performance by a factor of 2.5 on
>> some file system / storage combinations.
>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Whit Armstrong 2009-04-29 14:28:08 Re: partition question for new server setup
Previous Message Scott Carey 2009-04-29 01:31:59 Re: partition question for new server setup