Re: [PERFORM] Quad processor options - summary

From: Hadley Willan <hadley(dot)willan(at)deeperdesign(dot)co(dot)nz>
To: Bjoern Metzdorf <bm(at)turtle-entertainment(dot)de>
Cc: James Thornton <james(at)jamesthornton(dot)com>, pgsql-performance(at)postgresql(dot)org, pgsql-admin(at)postgresql(dot)org
Subject: Re: [PERFORM] Quad processor options - summary
Date: 2004-05-13 22:59:16
Message-ID: 1084489156.5424.31.camel@atlas.sol.deeper.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

I see you've got an LSI Megaraid card with oodles of Cache. However,
don't underestimate the power of the software RAID implementation that
Red Hat Linux comes with.

We're using RHE 2.1 and I can recommend Red Hat Enterprise Linux if you
want an excellent implementation of software RAID. In fact we have
found the software implementation more flexible than that of some
expensive hardware controllers. In addition there are also tools to
enhance the base implementation even further, making setup and
maintenance even easier. An advantage of the software implementation is
being able to RAID by partition, not necessarily entire disks.

To answer question 1, if you use software raid the chunk size is part of
the /etc/raidtab file that is used on initial container creation. 4KB is
the standard and a LARGE chunk size of 1MB may affect performance if
you're not writing down to blocks in that size continuously. If you
make it to big and you're constantly needing to write out smaller chunks
of information, then you will find the disk "always" working and would
be an inefficient use of the blocks. There is some free info around
about calculating the ideal chunk size. Looking for "Calculating chunk
size for RAID" through google.

In the software implementation, after setup the raidtab is uncessary as
the superblocks of the disks now contain their relevant information.
As for the application knowing any of this, no, the application layers
are entirely unaware of the lower implementation. They simply function
as normal by writing to directories that are now mounted a different
way. The kernel takes care of the underlying RAID writes and syncs.
3 is easy to implement with software raid under linux. You simply
partition the drive like normal, mark the partitions you want to "raid"
as 'fd' 'linux raid autodetect', then configure the /etc/raidtab and do
a mkraid /dev/mdxx where mdxx is the matching partition for the raid
setup. You can map them anyway you want, but it can get confusing if
you're mapping /dev/sda6 > /dev/sdb8 and calling it /dev/md7.
We've found it easier to make them all line up, /dev/sda6 > /dev/sdb6 >
/dev/md6

FYI, if you want better performance, use 15K SCSI disks, and make sure
you've got more than 8MB of cache per disk. Also, you're correct in
splitting the drives across the channel, that's a trap for young players
;-)

Bjoern is right to recommend an LVM, it will allow you to dynamically
allocate new size to the RAID volume when you add more disks. However
I've no experience in implementation with an LVM under the software RAID
for Linux, though I believe it can be done.

The software RAID implementation allows you to stop and start software
RAID devices as desired, add new hot spare disks to the containers as
needed and rebuild containers on the fly. You can even change kernel
options to speed up or slow down the sync speed when rebuilding the
container.

Anyway, have fun, cause striping is the hot rod of the RAID
implementations ;-)

Regards.
Hadley

On Fri, 2004-05-14 at 09:53, Bjoern Metzdorf wrote:

> James Thornton wrote:
>
> >> This is what I am considering the ultimate platform for postgresql:
> >>
> >> Hardware:
> >> Tyan Thunder K8QS board
> >> 2-4 x Opteron 848 in NUMA mode
> >> 4-8 GB RAM (DDR400 ECC Registered 1 GB modules, 2 for each processor)
> >> LSI Megaraid 320-2 with 256 MB cache ram and battery backup
> >> 6 x 36GB SCSI 10K drives + 1 spare running in RAID 10, split over both
> >> channels (3 + 4) for pgdata including indexes and wal.
> >
> > You might also consider configuring the Postgres data drives for a RAID
> > 10 SAME configuration as described in the Oracle paper "Optimal Storage
> > Configuration Made Easy"
> > (http://otn.oracle.com/deploy/availability/pdf/oow2000_same.pdf). Has
> > anyone delved into this before?
>
> Ok, if I understand it correctly the papers recommends the following:
>
> 1. Get many drives and stripe them into a RAID0 with a stripe width of
> 1MB. I am not quite sure if this stripe width is to be controlled at the
> application level (does postgres support this?) or if e.g. the "chunk
> size" of the linux software driver is meant. Normally a chunk size of
> 4KB is recommended, so 1MB sounds fairly large.
>
> 2. Mirror your RAID0 and get a RAID10.
>
> 3. Use primarily the fast, outer regions of your disks. In practice this
> might be achieved by putting only half of the disk (the outer half) into
> your stripe set. E.g. put only the outer 18GB of your 36GB disks into
> the stripe set. Btw, is it common for all drives that the outer region
> is on the higher block numbers? Or is it sometimes on the lower block
> numbers?
>
> 4. Subset data by partition, not disk. If you have 8 disks, then don't
> take a 4 disk RAID10 for data and the other one for log or indexes, but
> make a global 8 drive RAID10 and have it partitioned the way that data
> and log + indexes are located on all drives.
>
> They say, which is very interesting, as it is really contrary to what is
> normally recommended, that it is good or better to have one big stripe
> set over all disks available, than to put log + indexes on a separated
> stripe set. Having one big stripe set means that the speed of this big
> stripe set is available to all data. In practice this setup is as fast
> as or even faster than the "old" approach.
>
> ----------------------------------------------------------------
>
> Bottom line for a normal, less than 10 disk setup:
>
> Get many disks (8 + spare), create a RAID0 with 4 disks and mirror it to
> the other 4 disks for a RAID10. Make sure to create the RAID on the
> outer half of the disks (setup may depend on the disk model and raid
> controller used), leaving the inner half empty.
> Use a logical volume manager (LVM), which always helps when adding disk
> space, and create 2 partitions on your RAID10. One for data and one for
> log + indexes. This should look like this:
>
> ----- ----- ----- -----
> | 1 | | 1 | | 1 | | 1 |
> ----- ----- ----- ----- <- outer, faster half of the disk
> | 2 | | 2 | | 2 | | 2 | part of the RAID10
> ----- ----- ----- -----
> | | | | | | | |
> | | | | | | | | <- inner, slower half of the disk
> | | | | | | | | not used at all
> ----- ----- ----- -----
>
> Partition 1 for data, partition 2 for log + indexes. All mirrored to the
> other 4 disks not shown.
>
> If you take 36GB disks, this should end up like this:
>
> RAID10 has size of 36 / 2 * 4 = 72GB
> Partition 1 is 36 GB
> Partition 2 is 36 GB
>
> If 36GB is not enough for your pgdata set, you might consider moving to
> 72GB disks, or (even better) make a 16 drive RAID10 out of 36GB disks,
> which both will end up in a size of 72GB for your data (but the 16 drive
> version will be faster).
>
> Any comments?
>
> Regards,
> Bjoern
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message James Thornton 2004-05-13 23:36:16 Re: [PERFORM] Quad processor options - summary
Previous Message James Thornton 2004-05-13 22:50:45 Re: [PERFORM] Quad processor options - summary

Browse pgsql-performance by date

  From Date Subject
Next Message James Thornton 2004-05-13 23:36:16 Re: [PERFORM] Quad processor options - summary
Previous Message James Thornton 2004-05-13 22:50:45 Re: [PERFORM] Quad processor options - summary